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ABSTRACT 


Eveiy  material  has  a  tmique  electromagnetic  reflectance/emission  signature  v^ch 
can  be  used  to  identify  it.  Hyperspectral  imagers,  by  collecting  high  spectral  resolution 
data,  provide  the  ability  to  identify  these  q)ectral  signatures.  Utilization  and  exploitation 
of  hyperspectral  data  is  challenging  because  of  the  enormous  data  volume  produced  by 
these  imagers.  Most  current  processing  and  analyzation  techniques  involve  dimensionality 
reduction,  during  vviiich  some  information  is  lost.  This  thesis  demonstrates  the  ability  of 
neural  networks  and  the  Kohonen  Self-Organizing  Map  to  classify  hyperspectral  data. 

The  possibility  of  real  time  processing  is  addressed. 


TABLE  OF  CONTENTS 


I.  INTRODUCTION  . 1 

II.  NEURAL  NETWORKS  . 3 

A.  fflSTORY  OF  NEURAL  NETWORKS . 3 

B.  PRINCIPLES  OF  NEURAL  NETS . 10 

1.  Firing  Rules  . 10 

2.  Learning  Process . 12 

3.  Error-Correction  Learning:  The  Delta  Rule . 14 

C.  KOHONEN'S  SELF-ORGANIZING  MAP . 17 

1.  Motivation . 17 

2.  Biological  Foundation  . 17 

3 .  Network  Description . 18 

a.  Step  1  . 19 

b.  Step  2  . 20 

c.  Step  3  . 20 

d.  Step  4  . 20 

e.  Step  5  . 21 

f.  Step  6 . 22 

g.  Step  7 . 23 

h.  Step  8 . 23 

ni.  HYPERSPECTRAL  IMAGERY . 25 

A.  OVERVIEW . 25 

B.  SENSORS . 26 

1 .  Advanced  Airborne  Hyperspectral  Imaging  System  (AAHIS) . 27 

2.  Spatially  Modulated  Imaging  Fourier  Transform  Spectrometer 

(SMIFTS) . 28 


Vll 


a.  Fourier  Transform  Spectroscopy . 28 

b.  FTVHSI  Attributes  . 33 

IV.  ANALYSIS  AND  RESULTS . 35 

A.  SIMPLE,  TEST  SPECTRA . 35 

1.  Two  Node  Network . 36 

2.  Four  Node  Network  . 37 

3.  Arrays  of  Larger  Size  . 39 

B.  SMF'LE  IMAGES . 40 

1.  Black  and  White  Image . 41 

2.  Variable  Grey  Image . 41 

C.  VARIABLE  INTENSITIES  OF  SIMPLE  TEST  SPECTRA . 42 

D.  REAL  DATA  . 43 

1 .  Camouflage  Image  . 43 

a.  Sum  of  Absolute  Differences . 43 

b.  Correlation  Coefficient . 44 

2.  Operation  Desert  Radiance  Image . 45 

3.  Run  Times . 46 

V.  CONCLUSIONS  . 47 

APPENDIX  A . 49 

APPENDIX  B . 89 

REFERENCES . 97 

INITIAL  DISTRIBUTION  LIST 


viii 


99 


1.  INTRODUCTION 


Remote  sensing  with  optical  sensors  has  evolved  from  panchromatic  to  multispectral 
systems,  such  as  LANDSAT,  over  the  last  two  decades.  These  systems  collect 
electromagnetic  data  over  a  spectral  range  of  interest,  typically  in  the  visible  and  near 
infrared.  Such  data  have  been  used  for  earth  resource  studies,  land  use,  and  agriculture 
since  the  1970's.  Such  data  allow  for  broad  categorization  of  remotely  sensed  regions. 
Higher  resolution  data  offers  a  promise  of  more  detailed  information.  Hyperspectral 
imaging  utilizes  a  set  of  images  each  of  which  covers  a  spectral  band  of  10  nanometers  or 
less.  This  narrow  band  collection  process  makes  it  possible  to  resolve  minerals,  types  of 
plants,  and  the  spectra  of  various  "cultural"  artifacts. 

Inherent  with  the  multiple  spectral  band  data  collection  process  is  an  increase  in  the 
data  volume  associated  with  each  pixel  and,  hence,  each  image.  An  image  produced  by 
AVIRIS  (a  224  band  instrument),  contains  over  thirty  times  the  data  contained  in  a 
similar  image  produced  by  LANDSAT  (a  7  band  instrument).  This  creates  a  problem  for 
storing  and  processing.  Most  current  hyperspectral  data  processing  methods  begin  with 
some  form  of  dimensionality  reduction,  such  as  principle  components,  and  none,  as  far  as 
the  author  knows,  are  done  real  time.  Any  time  data  reduction  is  done  some  of  the 
information  is  lost  and  non  real  time  processing  implies  that  all  data  must  be  stored.  A 
process  which  utilizes  all  of  the  data  for  material  identification  and  then  performs  a 
massive  data  reduction  would  be  ideal. 

Concurrent  with  the  progress  in  hyperspectral  imaging,  computer  technology  and 
our  understanding  of  the  biological  processes  involved  in  learning  and  intelligence  have 
made  significant  advances.  These  advancements  have  allowed  neural  networks  to 
progress  into  a  mature  technology.  They  are  useful  in  the  following  situations: 

1 .  Capturing  associations  or  discovering  regularities  within  a  set  of  patterns 
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2.  Where  the  volume,  number  of  variables  or  diversity  of  the  data  is  very  great 

3 .  The  relationships  between  variables  are  vaguely  understood 

4.  The  relationships  are  difficult  to  describe  adequately  with  conventional 
approaches.  (Westervelt,  Krzysik,  and  Seel,  1994) 

Hyperspectral  data  fit  all  four  of  these  criterion. 

This  thesis  tries  to  attack  the  problem  of  analyzing  hyperspectral  images  by 
employing  neural  networks  with  the  Kohonen  Self-Organizing  Map,  which  has  been 
described  by  Stan  Openshaw  as  being  simple,  flexible,  and  capable  of  handling  immense 
quantities  of  data  (Openshaw,  1994).  It  is  broken  down  into  five  chapters  and  two 
appendixes.  Chapter  I,  the  INTRODUCTION,  is  followed  by  Chapter  II,  NEURAL 
NETWORKS,  which  addresses  the  history  of  neural  networks,  presents  an  introduction 
to  their  workings  and  concludes  with  a  discussion  of  Teuvo  Kohonen's  neural  network. 
Chapter  III,  HYPERSPECTRAL  IMAGERY,  defines  the  term  and  presents  a  description 
of  the  two  instruments  used  to  collect  the  data  analyzed  in  Chapter  IV,  ANALYSIS  AND 
RESULTS.  Chapter  V  contains  the  CONCLUSIONS  drawn  from  Chapter  IV. 

Appendix  A  contains  all  figures  and  Appendix  B  contains  an  DDL  version  of  Kohonen's 
network. 
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n.  NEURAL  NETWORKS 


A.  HISTORY  OF  NEURAL  NETWORKS 

Since  the  late  1980's,  the  field  of  neural  networks  has  garnered  significant  attention 
from  both  the  commercial  and  research  communities.  This  method  of  analyzing 
problems  has  gained  renewed  interest  as  computer  processing  speed  has  exploded.  This 
work  will  outline  the  history  of  Neural  Networks,  describe  the  basic  concepts  upon  which 
all  Neural  Networks  are  constructed  and  give  a  step  by  step  description  of  the  Kohonen 
Self  Organizing  Map.  In  this  report  the  terms  'neural  networks'  and  'neural  computing' 
are  synonymous  and  are  defined  as; 

The  study  of  networks  of  adaptable  nodes  which,  through  a  process  of  learning 

from  task  examples,  store  experiential  knowledge  and  make  it  available  for  use. 

(Aleksander  and  Morton,  1990) 

The  concept  is  not  new.  It  owes  its  origins  to  the  study  of  the  fundamental  cell  of  the 
living  brain:  the  neuron  (Harvey,  1994).  An  elementary  model  of  a  neuron  consists  of 
the  body  of  the  cell,  called  the  soma,  and  the  axon,  which  links  the  neurons  together. 

The  point  at  which  the  soma  and  the  axon  connect  is  known  as  the  synapse  or  synaptic 
connection.  When  properly  stimulated  by  the  synaptic  connections,  the  soma  fires, 
sending  out  a  small  electrical  pulse,  which  travels  down  the  axon  to  other  neurons 
(McCulloch  and  Pitts,  1943).  In  a  rudimentary  sense,  the  brain  learns  by  setting  a 
criterion  for  the  soma  to  fire  and  adjusting  the  "weights"  of  the  synaptic  connections 
through  experience.  Both  the  criterion  and  the  weights  are  adjusted  continuously.  In 
1943  neurophysiologist  Warren  McCulloch  and  logiciein  Walter  Pitts  published  A  Logical 
Calculus  of  the  Ideas  Immanent  in  Nervous  Activity  which  described  the  neural  process 
and  developed  a  simple  model  of  the  neuron  using  variable  resistors,  which  represented 
the  variable  synaptic  connections,  and  summing  amplifiers,  which  represented  the 
operation  of  the  neuron  body.  This  model  was  the  product  of  over  five  years  of  research 
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conducted  by  a  neural  modeling  community  centered  at  the  University  of  Chicago 
(Haykin,  1994).  The  model  was  adopted  by  the  pioneers  of  neural  computing  and 
provides  the  basis  of  most  nets  being  discussed  today  (Aleksander  and  Morton,  1990). 

In  1949  neurophysiologist  Donald  Hebb  published  The  Organization  of  Behavior  in 
that  an  explicit  statement  of  a  physiological  learning  rule  for  synaptic  modification  was 
presented  for  the  first  time  (Haykin,  1994).  He  put  forward  the  idea  that  a  group  of 
neurons  could  reverberate  in  different  patterns,  each  being  related  to  a  different 
experience  (Hebb,  1949).  Specifically,  Hebb  proposed  that  the  connectivity  of  the  brain 
is  continually  changing  as  an  organism  learns  differing  functional  tasks,  and  that  neural 
assemblies  are  created  by  such  changes  (Haykin,  1994).  He  introduced  his  "postulate  of 
learning",  which  states  that  the  strength  or  weight  of  the  synaptic  connection  between 
two  neurons  is  increased  by  the  repeated  activation  of  one  neuron  by  the  other  across  that 
synapse.  This  laid  the  seeds  for  a  model  of  dynamic  memory  and  increased  the  possible 
applications  of  neural  computing. 

A  major  milestone  was  reached  by  Frank  Rosenblatt  in  1958  with  the  creation  of  the 
perceptron,  a  word  he  coined  to  refer  to  a  class  of  simple  neuron-like  learning  networks 
that  were  capable  of  recognizing  images.  His  creation  was  based  on  the  McCulloch  and 
Pitts  model  discussed  above.  Rosenblatt  pioneered  two  techniques  of  fundamental 
importance  to  the  study  of  learning  in  neural  computing:  digital  computer  simulation  and 
formal  mathematical  analysis  (Rumelhart  and  McClelland,  1986).  In  his  1962  book 
Principles  of  Neural-dynamics,  Rosenblatt  describes  what  he  thought  he  was  doing  as 
follows: 

Perceptrons  are  not  intended  to  serve  as  detailed  copies  of  any  actual  nervous 
system.  They're  simplified  networks,  designed  to  permit  the  study  of  lawful 
relationships  between  the  organization  of  a  nerve  net,  the  organization  of  its 
environment ,  and  the  "psychological"  performances  of  which  it  is  capable. 
Perceptrons  might  actually  correspond  to  parts  of  more  extended  networks  and 
biological  systems;  in  this  case,  the  results  obtained  will  be  directly  applicable. 

More  likely  they  represent  extreme  simplification  of  the  central  nervous  system 
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in  which  some  properties  are  exaggerated  and  others  suppressed.  In  this  case, 
successive  perturbations  and  refinements  of  the  system  may  yield  a  closer 
approximation. 

The  main  strength  of  this  approach  is  that  it  permits  meaningful  questions  to  be 
asked  and  answered  about  particular  types  of  organizations,  hypothetical  memory 
mechanisms,  and  neural  models.  When  exact  analytical  answers  are  unobtainable, 
experimental  methods,  either  with  digital  simulation  or  hardware  models,  are 
employed.  The  model  is  not  the  terminal  result,  but  a  starting  point  for  exploratory 
analysis  of  its  behavior.  (  Rosenblatt,  1962) 

Rosenblatt's  perceptron  created  much  excitement  in  the  world  of  neural  computing  and  is 
a  fundamental  building  block  for  many  of  today's  functioning  neural  nets. 

Progress  in  neural  computing  continued  throughout  the  1960’s.  Widrow  and  Hoff 
introduced  the  "least  mean-square  (LMS)"  algorithm  in  1960  and  used  it  to  train  an 
adaptive  pattern  classification  machine  (called  Adaline  for  adaptive  linear  element).  This 
machine  was  constructed  for  the  purpose  of  illustrating  adaptive  behavior  and  artificial 
learning  (Widrow  and  Hoff,  1960)  Except  for  the  training  procedure,  it  was  identical  to 
the  perceptron.  Two  years  later,  Widrow  and  his  students  introduced  one  of  the  earliest 
attempts  at  a  trainable,  layered  neural  network  with  multiple  adaptive  elements.  They 
called  their  creation  Madaline  (multiple-Adaline)  (Haykin,  1994).  As  the  decade 
progressed  it  became  apparent  that  neural  nets  could  be  simulated  on  contemporary 
computers. 

Neural  computing  seemed  to  have  unlimited  potential  until  1969  when  Marvin 
Minsky  and  Seymour  Papert  published  their  book  'Perceptrons:  An  Introduction  to 
Computational  Geometry'.  This  book  questioned  the  ability  of  single  layer  perceptrons 
to  perform  some  simple  image  recognition  tasks.  The  book  described  two  problems, 
called  parity  and  connectedness,  which  perceptrons  could  not  perform  (Minsky  and 
Papert,  1969).  Parity  refers  to  whether  an  image  contains  an  odd  or  even  number  of 
distinct  isolated  parts.  Connectedness  is  best  defined  by  an  example.  The  letter  w  is 
connected  where  as  the  letter  /  is  not  connected.  Both  of  these  tasks  can  be  easily  solved 
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by  conventional  computing  methods  but  Minsky  and  Papert  rigorously  showed  that  in 
order  for  a  single  layer  perceptron  network  to  solve  these  problems  the  size  of  the 
network  had  to  grow  as  the  size  of  the  image  grew.  This  is  unacceptable  as  it  would  lead 
to  a  different  architecture  for  images  of  different  sizes.  These  and  related  problems 
became  known  as  hard  learning  problems. 

Several  factors  combined  to  cause  advancements  in  neural  computing  to  slow  to  a 
trickle  throughout  the  seventies.  First  and  foremost  was  the  elegance  with  which  Minsky 
a.nd  Papert  defined  the  problems.  This  made  gathering  interest  and  financial  support  very 
difficult.  Another  problem  was  that  technology  had  yet  to  produce  personal  computers 
and  workstations  for  experimentation.  During  this  time  many  of  the  researchers  deserted 
the  field  in  search  of  more  promising  areas.  This  left  only  a  handful  of  early  pioneers 
continuing  to  do  research  in  neural  computing,  most  of  which  were  from  the  psychology 
and  neuroscience  communities.  From  a  physics  and  engineering  perspective,  we  may 
look  back  on  the  1970s  as  a  "decade  of  dormancy"  for  neural  networks  (Haykin,  1994). 

In  spite  of  this  lack  of  interest,  one  important  concept  did  emerge  during  the  1970s. 
This  being  competitive  learning  and  "self-organizing  maps".  The  idea  is  to  feed  several 
processing  nodes  with  the  same  inputs.  Each  time  an  input  is  presented,  the  node  which 
most  closely  represents  that  input  is  declared  the  winner.  It  and  its  neighbors  are  updated 
to  more  closely  resemble  that  input.  Eventually,  only  the  winner  is  updated.  The  result 
is  a  network  which  classifies  the  inputs  into  groups  (this  discussion  is  expanded  in 
Section  C  of  this  chapter).  The  first  computer  demonstration  of  this  process  was 
probably  done  by  von  der  Malsburg  in  1973  (Haykin,  1994).  In  1976,  Willshaw  and 
von  der  Malsburg  published  the  first  paper  on  self-organizing  maps. 

John  Hopfield  of  the  California  Institute  of  Technology  was  responsible  for  the 
resurgence  of  interest,  especially  in  the  physics  and  engineering  communities,  in  the 
analysis  of  neural  computing.  His  1982  paper  entitled  'Ncutal  Networks  ctnd  Physical 
Systems  with  Emergent  Collective  Properties'  drew  attention  to  two  properties  of 
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interconnected  cells  of  simple  non-linear  devices:  first,  that  such  a  system  has  stable 
states  which  will  always  be  entered  if  the  net  is  started  in  similar  states  and,  second,  the 
fact  that  such  states  can  be  created  by  changing  the  strength  of  the  interconnections 
between  the  cells  (Haykin,  1994).  Much  of  the  fascination  of  this  paper  came  from  the 
realization  that  the  properties  he  identified  are  inherent  to  fully  interconnected  neural 
networks.  These  properties  are  known  as  associative  memory  to  computer  engineers 
(Aleksander  and  Morton,  1990).  His  analysis  is  derived  from  the  physical  concept  of 
energy.  This  energy  is  represented  by  successive  firings  of  the  net  and  is  determined  by 
the  strength  of  the  connections  and  the  thresholds  of  the  neurons.  He  offers  a  proof  that 
the  network  will  operate  by  minimizing  this  energy  when  settling  into  stable  patterns  of 
operation,  or  'energy  wells'  (Aleksander  and  Morton,  1990).  Each  successive  firing  of 
the  net  will  decrease  its  overall  energy  and  eventually  the  net  would  arrive  at  a  stable 
minimum  at  which  time  the  firing  pattern  would  remain  constant.  He  also  demonstrated 
that  the  connection  strengths  and  thresholds  can  be  calculated  so  as  to  create  these  stable 
energy  states.  This  class  of  fully  interconnected  neural  networks  with  feedback  attracted 
a  great  deal  of  attention  in  the  1980s  and  in  the  course  of  time  became  known  as  Hopfield 
networks  (Haykin,  1994). 

Another  significant  development  occurred  during  1982.  Teuvo  Kohonen  published  a 
paper  furthering  the  concept  and  usefulness  of  self-organizing  maps.  His  paper  has 
received  much  greater  attention  than  the  earlier  work  of  Willshaw  and  von  der  Malsburg 
(Haykin,  1994).  A  detailed  explanation  of  self-organization  and  the  Kohonen  Network  is 
presented  in  Section  C. 

Subsequent  investigation  of  the  Hopfield  Model  revealed  that  it  was  possible  to  train 
a  network.  Rules  such  as  the  Widrow-Hoff  (LMS)  procedure  could  be  used  to  make 
gradual  adjustments  to  the  net  parameters  until  the  wells  were  created.  This  eliminated 
the  need  to  solve  massive  sets  of  simultaneous  equations  by  conventional  methods  and 
implied  that  neural  networks  could  be  used  as  a  tool  to  solve  such  sets  of  equations 
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(Aleksander  and  Morton,  1990). 

The  Hopfield  Model  did  not  solve  all  problems  associated  with  neural  computing.  It 
quickly  became  apparent  that  false,  or  local,  minima  occurred.  It  also  did  not  solve  the 
problem  of  hard  learning. 

In  1986  Geoffrey  Hinton  and  Terry  Sejnowski  introduced  methods  of  solving  both  of 
these  problems  using  a  fully  interconnected  network.  In  order  to  solve  the  local  minima 
problem  they  introduced  'noise'  to  the  Hopfield  model  and  called  their  net  the  Boltzman 
machine  (Aleksander  and  Morton,  1990).  This  can  be  viewed  as  in  Figure  2.1.  If  enough 
noise  is  added  to  the  ball,  it  will  move  freely  between  the  two  minima.  If  the  level  of 
noise  is  decreased  slowly  enough  the  ball  eventually  come  to  rest  in  the  deeper  of  the  two 
minima  This  process  is  known  as  simulated  annealing.  It  is  based  on  the  proof  by 
Hopfield  that  the  global  energy  function  representing  the  network  can  be  minimized 
through  a  process  of  asynchronously  updating  the  nodes  (Rumelhart,  Hinton,  and 
McClelland,  1986).  As  for  the  problem  of  hard  learning,  Hinton  and  Sejnowski 
developed  learning  rules  which  allowed  hidden  nodes  in  a  Boltzman  machine  to  be 
trained.  Boltzman  machines  of  any  complexity  require  enormous  processing  time  and/or 
speed  to  be  practical. 

Minsky  and  Papert  had  stated  that  if  a  method  of  training  a  multiple  layer,  feed 
forward,  perceptron  based  network  could  be  found,  hard  learning  could  be  accomplished. 
la.  Parallel  Distributed  Processing,  published  in  1986,  Rummelhart,  Hinton  and  Williams 
derived  a  learning  algorithm  which  accomplished  this  task.  A  similar  generalization  of 
the  algorithm  was  derived  independently  by  Parker  in  1985,  and  a  roughly  similar 
learning  algorithm  was  studied  by  LeCun,  also  in  1985.  This  process  became  known  as 
'back-propagation'  and  is  now  the  most  popular  learning  algorithm  for  the  training  of 
multi  layer  perceptrons  (Haykin,  1994). 

In  1988,  Linsker  described  a  new  principle  for  self-organization  in  a  perceptual 
network.  The  principle  is  designed  to  preserve  maximum  information  about  input 
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activity  patterns,  subject  to  such  constraints  as  synaptic  connections  and  synapse  d5niamic 
range.  A  similar  suggestion  had  been  made  independently  by  several  vision  researchers. 
However  it  was  Linsker  who  used  abstract  concepts  rooted  in  information  theory  to 
formulate  the  principle  of  maximum  information  preservation.  (Haykin,  1994) 

In  the  1 990s,  much  of  the  research  involving  neural  networks  has  centered  around 
applications.  This  paper  will  attempt  to  contribute  in  this  area  by  applying  neural 
networks  to  hyperspectral  imagery  for  the  purpose  of  classification.  At  least  two 
previous  attempts  have  been  made.  Brown  and  DeRouin  have  published  several  papers 
including  Comparing  Neural  Network  Classifiers  and  Feature  Selection  for  Target 
Detection  in  Hyperspectral  Imagery  (1992),  which  investigated  neural  network 
applicability  to  target  detection  and  feature  selection  in  an  automatic  target  detection 
scenario.  Also  in  1992  Shen  and  Horblit  published  Application  of  Neural  Networks  to 
Hyper-Spectral  Image  Analysis  and  Interpretation,  which  examined  the  possibility  of 
classifying  AVIRIS  data  These  attempts  are  similar  in  that  both  use  some  method  of 
data  reduction  to  decrease  the  dimensionality  (each  used  the  method  of  principle 
components  at  some  point),  and  then  apply  an  error  back  propagation,  feed  forward 
network  to  the  reduced  data  set.  Brown  and  DeRouin  used  their  network  to  produce  a 
yes  or  no  determination  for  each  pixel  in  the  image.  A  "yes"  indicating  that  the  pixel  of 
interest  belonged  to  the  category  of  interest,  in  this  case  camouflaged  targets.  Shen  and 
Horblit  attempted  to  design  a  network  which  classified  the  scene  according  to  spectral 
similarity.  Both  of  these  papers  demonstrate,  to  varying  degrees,  the  applicability  of 
neural  networks  to  hyperspectral  imagery. 

The  resurgence  of  research  in  neural  net  technology  throughout  the  1980s  can  be 
directly  traced  to  Hopfield's  1982  paper  and  was  given  a  boost  by  the  publication  of 
Rummelhart  and  McLelland's  book  Parallel  Distributed  Processing.  As  computer 
processing  speed  continues  to  increase,  possible  applications  of  neural  computing  in  the 
science  and  engineering  communities  appear  to  be  limited  only  by  one's  imagination. 
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B.  PRINCIPLES  OF  NEURAL  NETS 

Typically  neural  nets  are  organized  in  layers,  with  each  layer  consisting  of  n 
processing  nodes.  Figure  2.2  illustrates  properties  which  are  common  to  most  neural 
nodes.  Each  node  has  an  input  (x^  -x^)  which  is  modulated  by  an  adjustable  weight 
(wj  -w^).  The  bias  input  is  fixed  at  a  value  of  1  and,  when  combined  with  its 
adjustable  weight  w^,  acts  as  a  threshold  for  the  firing  rule.  The  significance  of  this  will 
be  explained  in  Section  3. a  of  this  chapter.  The  node  performs  a  summation  v(k),  where 
k  stands  for  the  node  number,  on  .e  weight  modulated  inputs  according  to: 

<k)  =  XW^  +  XjWj  +  XjWj  ^  (2-^) 

The  node  operates  in  either  the  Teach  or  Use  mode.  This  is  determined  by  the  position 
of  a  switch.  While  in  the  Teach,  or  training  mode,  the  node  will  use  some  type  of 
'learning  rule'  which  modifies  the  weights  of  the  connections  according  to  a  comparison 
of  the  Teaching  Input  and  the  Output.  This  process  is  illustrated  in  Figure  2.3. 

Training  continues  for  either  a  defined  period  of  time  or  until  the  output  is  within  a 
specified  delta  of  the  Teaching  Input.  Once  training  is  complete  the  node  is  ready  to 
enter  the  Use  mode.  In  the  Use  mode  the  neuron  is  presented  with  input  patterns.  For 
any  input  resembling  the  Teach  Input  to  a  specified  degree,  the  node  will  respond  with 
the  'learned'  output. 

1.  Firing  Rules 

The  word  'firing'  is  borrowed  from  the  world  of  biology,  where  it  describes  the  act  of 
a  neuron  emitting  a  series  of  electrical  pulses  ( Aleksander  and  Morton,  1990).  A  'firing 
rule',  also  known  as  an  activation  function  and  represented  by  q)(k),  determines  how  a 
node  responds  to  a  given  input  pattern.  In  the  following  example,  which  is  slightly 
modified  from  Aleksander  and  Morton,  1990,  let  a  1  represent  a  node  which  is  firing  and 
a  0  one  which  is  not  firing.  Allow  the  rule  to  be  as  follows: 
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Take  a  collection  of  training  patterns  for  a  node,  some  of  which  cause  it  to  fire  (the 
1 -taught  set  of  patterns)  and  others  which  prevent  it  from  doing  so  (the  0-taught 
set).  Then  the  patterns  not  in  the  collection  cause  the  node  to  fire  if,  on  comparison, 
they  have  more  input  elements  in  common  with  the  'nearest'  pattern  in  the  1 -taught 
set  than  with  the  0-taught  set.  If  there  is  a  tie  do  not  fire. 

Given  a  three  input  node  with  a  1 -taught  set  (Xj,  Xj,  Xj)  of  1 1 1  and  101,  and  0-taught  set 
of  000  and  001,  the  following  truth  table  is  produced  prior  to  the  application  of  the  firing 
rule: 


*1 

0 

0 

0 

0 

1 

1 

1 

1 

*2 

0 

0 

1 

1 

0 

0 

1 

1 

0 

1 

1 

0 

0 

1 

0 

1 

F 

0 

0 

0/1 

0/1 

0/1 

1 

0/1 

1 

Table  2. 1  Truth  table  prior  to  firing  rule  application. 

Ambiguities  exist  in  all  input  patterns  which  were  not  part  of  the  training  set.  As  an 
example  of  the  way  the  firing  rule  is  applied,  take  pattern  010.  It  differs  from  000  only 
in  the  second  element,  from  001  in  the  second  and  third  elements,  from  101  in  all  three 
elements,  and  from  1 1 1  in  the  first  and  third  element.  Pattern  010  is  closest  to  pattern 
000  which  is  part  of  the  0-taught  set.  Therefore  the  node  will  not  fire  when  presented 
with  the  0 1 0  pattern.  After  the  application  of  the  firing  rule  the  truth  table  becomes: 


0 

0 

0 

0 

1 

1 

o 

1 

^2 

0 

0 

1 

1 

0 

0 

1 

1 

*3 

0 

1 

1 

0 

0 

1 

0 

1 

F 

0 

0 

0 

0 

0 

1 

1 

1 

Table  2.2  Truth  table  after  firing  rule  application. 
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The  firing  rule  allows  the  node  to  classify  input  sets,  which  have  not  been  seen  during 
training  and  are  not  identical  to  the  training  inputs,  by  comparing  the  degree  of  similarity 
or  distance  between  sets.  This  gives  the  neural  net  one  of  its  most  powerful  and 
intriguing  abilities. 

Listed  below  are  three  commonly  used  firing  rules  (Haykin,  1994): 


1 .  Threshold  Function. 


(p(v)  - 


if  viO 
if  v-<0 


2.  Piecewise-Linear  Function. 


(2.2) 


1 


(p(v)  == 


v^l/2 


(2.3) 


3.  Sigmoid  Function. 

tp(v)  =  - ^ -  (2.4) 

1  +6'“'' 

The  sigmoid  and  related  functions  are  by  far  the  most  common  form  of  activation 
function  used  in  the  construction  of  neural  networks  (Haykin,  1994).  It  is  non-linear, 
which  is  required  to  solve  non-linearly  separable  problems.  It  offers  a  continuous  range 
of  values  from  0  to  1,  is  smooth,  and  is  differentiable  (the  importance  of  this  will  become 
apparent  during  the  discussion  of  error  correction  learning).  A  plot  of  this  function  is 
given  in  Figure  2.4. 

2.  Learning  Process 

Fundamental  to  the  performance  and  effectiveness  of  a  neural  net  is  its  ability  to 
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leam.  Learning  in  the  context  of  neural  networks  has  been  defined  as; 

Learning  is  the  process  by  which  the  free  parameters  of  a  neural  network  are 
adapted  through  a  continuing  process  of  stimulation  by  the  environment  in  which 
the  network  is  embedded.  The  type  of  learning  is  determined  by  the  manner  in 
which  the  parameter  changes  take  place.  (Haykin,  1994) 

This  definition  of  the  learning  process  implies  the  following  sequence  of  events: 

1 .  The  neural  network  is  stimulated  by  an  environment. 

2.  The  neural  network  undergoes  changes  as  a  result  of  this  stimulation. 

3.  The  neural  network  responds  in  a  new  way  to  the  environment,  because  of  the 
changes  that  have  occurred  in  its  structure.  (Haykin,  1994) 

In  the  above  example,  the  learning  process  had  already  taken  place.  It  was  used  to  teach 
the  network  to  fire  appropriately  for  the  training  patterns.  If,  during  training,  the  output 
of  the  network  had  been  1  for  the  input  000,  then  the  weights  associated  with  each  input 
would  have  been  adjusted  such  that  the  output  would  become  1 .  This  process  is 
demonstrated  in  Section  B.3.a  of  this  chapter. 

Many  algorithms,  or  'learning  rules',  have  been  developed  to  train  neural  nets.  These 
include  Error-Correction,  Hebbian,  Competitive,  and  Boltzman  learning  (Haykin,  1994). 
All  learning  rules  can  be  classified  as  either  supervised  or  imsupervised.  These  are 
defined  as  follows; 

Supervised  training  requires  the  pairing  of  each  input  vector  with  a  target  vector 
representing  the  desired  output.  An  input  vector  is  applied  and  the  output  of  the 
network  is  calculated  which  is  compared  with  the  corresponding  target  vector.  The 
difference  (error)  is  fed  back  through  the  network  and  its  weights  are  changed, 
according  to  an  algorithm  that  tends  to  minimize  the  error. 

Unsupervised  learning  requires  no  target  vector  for  the  outputs.  The  training  set 
consists  solely  of  input  vectors.  The  training  algorithm  modifies  network  weights  to 
produce  output  vectors  that  are  consistent.  The  training  process  extracts  the  statistical 
properties  of  the  training  set  and  groups  similar  input  vectors  into  classes. 
Unsupervised  training  is  a  for  more  plausible  model  of  learning  in  the  biological 
system.  (Nasrabadi,  1994) 
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Supervised  learning  will  be  presented  through  a  discussion  of  the  Delta  Rule,  which  is  a 
form  of  Error-Correction.  Unsupervised  learning  will  be  examined  in  Chapter  2  Sec  C  as 
part  of  the  discussion  on  Kohonen  Self  Organizing  Maps. 

3.  Error-Correction  Learning;  The  Delta  Rule 

The  ability  of  a  neural  net  to  learn  lies  in  its  variable  connection  weights.  The  initial 
weights  are  generated  in  a  random  fashion  and  contain  no  useful  information.  An 
effective  method  of  updating  these  weights  must  be  incorporated.  This  was  a  large 
stumbling  block  in  the  development  of  neural  nets,  especially  nets  containing  hidden 
layers  as  depicted  in  Figure  5.  Error-correction  is  one  means  of  accomplishing  this. 

The  Delta  Rule,  also  known  as  the  Widrow-Hoff  Rule,  was  first  suggested  by 
Bernard  Widrow  in  1962  (Widrow,  1962).  It  has  been  widely  used  and  is  well 
understood.  The  concept  of  the  Delta  Rule  is  to  take  the  difference  (e)  between  the 
threshold  of  the  target  output  (t)  plus  a  desired  overshoot  (6)  and  the  node  summation 
v(k),  multiply  by  a  constant  (t^)  known  as  the  'learning  rate',  and  adjust  (Aw)  the  active 
weights  accordingly.  In  equation  form  this  looks  like; 

t  =  (t  +  h)-  v(k)  (2.5) 

=  •n8j.(M)  (2.6) 

Where  k  represents  the  node  of  interest,  z  indicates  the  weight  associated  with  node  k,  and 
n  denotes  the  epoch  or  learning  iteration.  The  rule  should  be  applied  to  the  desired  truth 
table  using  the  following  steps; 

1 .  Select  a  truth  table  column. 

2.  If  an  error  is  detected,  determine  the  distance  between  v(k)  and  the  desired  firing 
value. 

3 .  Adjust  the  weights  that  have  firing  inputs  and  the  threshold  to  remove  a  portion  of 
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the  error. 


4.  Go  back  to  step  1,  until  none  of  the  columns  cause  errors.  (Aleksander  and 
Morton,  1990) 

The  best  way  to  describe  the  Delta  Rule  is  through  an  example.  A  simple  system 
consisting  of  a  discrete  node  with  two  inputs  and  a  bias  which  is  held  constant  at  1) 
is  chosen  to  solve  a  linearly  separable  problem.  Figure  2.6  depicts  this  system.  The 
problem  is  defined  as  follows:  Classify  the  four  points  A,  B,  C,  and  D  into  two  groups 
as  depicted  in  Figure  2.7  Table  2.3  depicts  the  truth  table  solution  illustrated  in 
Figure  2.7. 


A 

B 

c 

D 

0 

0 

1 

1 

*2 

0 

1 

0 

1 

F 

0 

1 

0 

1 

Table  2.3  Example  problem  truth  table. 


Initial  conditions  are  set  as  follows: 

w  =  +  0.2  w  ,=-0.25  w  =  0.2 

O  1  z 

6=  ±0.1  T)  =  ±0.4 

The  threshold  function  is  used  as  the  firing  rule. 

Training  is  started  by  presenting  point  A  to  the  node  with  the  desired  output  of  0.  The 
summation  v  is  calculated  according  to  Eq.  2.4  to  be  0.2  which  would  result  in  an 
incorrect  output  of  1.  The  error  e  is  calculated  using  Eq.  2.5  to  be  -0.3  and  Aw  (by  Eq. 
2.6)  to  be  -0. 12.  Aw  is  applied  only  to  because  it  is  the  sole  active  weight.  Once  Aw 
is  applied,  w^  becomes  0.08.  The  complete  process  is  depicted  in  Table  2.4. 
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The  final  solution  yields  the  following  equation: 


V  =  -0.176X,  +  0.274X,  -  0.017  (2.7) 

Setting  V  equal  to  the  threshold  (  0  )  and  rearranging  produces  Eq.  2.8. 

Xj  =  1.56x2  -  0.10  (2.8) 

This  line  is  superimposed  on  the  original  problem  (Figure  2.7)  to  illustrate  that  it  does  in 
fact  separate  the  two  groups  appropriately  (see  Figure  2.8).  While  this  figure  makes  it 
obvious  that  Eq.  2.8  is  a  solution,  it  also  makes  it  obvious  that  it  is  not  a  unique  solution. 
Because  neural  nets  present  non  unique  solutions,  proper  training  for  each  environment  is 
imperative  to  their  success. 


EPOCH 

INPUT 

w 

o 

^2 

V 

COMMENT 

1 

A 

0.200 

-0.250 

0.200 

0.200 

> 

il 

o 

2 

A 

0.080 

-0.250 

0.200 

0.080 

Aw  =  -0.072 

3 

A 

0.008 

-0.250 

0.200 

0.008 

Aw  =  -0.043 

4 

A 

-0.035 

-0.250 

0.200 

-0.035 

Column  1  Satisfied 

5 

B 

-0.035 

-0.250 

0.200 

-0.285 

Column  2  Satisfied 

6 

c 

-0.035 

-0.250 

0.200 

0.165 

Column  3  Satisfied 

7 

D 

-0.035 

-0.250 

0.200 

-0.085 

Aw  =  0.074 

8 

D 

0.039 

-0.176 

0.274 

0.137 

Column  1  Satisfied 

9 

A 

0.039 

-0.176 

0.274 

0.039 

Aw  =  -0.056 

10 

A 

-0.017 

-0.176 

0.274 

-0.017 

Column  1  Satisfied 

11 

B 

-0.017 

-0.176 

0.274 

-0.193 

Column  2  Satisfied 

12 

c 

-0.017 

-0.176 

0.274 

0.257 

Column  3  Satisfied 

13 

D 

-0.017 

-0.176 

0.274 

0.082 

Column  4  Satisfied 

Table  2.4  Iterations  of  network. 
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C.  KOHONEN'S  SELF-ORGANIZING  MAP 


1.  Motivation 

Economic  representation  of  data  with  all  their  interrelationships  is  one  of  the  most 
central  problems  in  information  sciences,  and  such  an  ability  is  obviously 
characteristic  of  the  operation  of  the  brain,  too.  In  thinking,  and  in  the  subconscious 
information  processing,  there  is  a  general  tendency  to  compress  information  by 
forming  reduced  representations  of  the  most  relevant  facts,  without  loss  of 
knowledge  about  their  interrelationships.  The  purpose  of  intelligent  information 
processing  seems  in  general  to  be  creation  of  simplified  images  of  the  observable 
world  at  various  levels  of  abstraction,  in  relation  to  a  particular  subset  of  received 
data.  (Kohonen,  1987) 

From  this  perspective  Teuvo  Kohonen  has  investigated  and  tried  to  duplicate  the 
processes  that  the  brain  uses  to  recognize,  categorize  and  identify. 

2.  Biological  Foundation 

Evidence  generated  in  the  late  1970's  indicates  that  the  brain  forms  neural 
representations  of  the  various  sensory  inputs,  adjusting  these  representations  over  time  to 
form  an  image  of  that  input.  This  was  followed  by  the  discovery  that  certain  areas  of  the 
cortex,  when  stimulated  by  a  sensory  input,  produce  a  response  which  preserves  the 
topographical  order  of  the  input.  The  sensory  responses  of  the  auditory  cortex  exhibit 
this  behavior.  In  the  auditory  cortex  there  exists  a  "tonotopic  map"  in  which  the  spatial 
order  of  cell  responses  correspond  to  the  pitch  or  acoustic  frequencies  of  tones  perceived 
(Kohonen,  1987),  Sounds  received  by  the  ear  produce  a  response  from  a  defined  area  on 
the  cortex.  The  lowest  perceptible  acoustic  frequencies  induce  a  response  at  one  edge  of 
this  area  while  the  highest  frequencies  induce  a  response  at  the  opposite  edge.  The  map 
of  acoustic  frequency  response  across  this  area  is  perfectly  ordered  and  almost 
logarithmic  with  respect  to  frequency. 

The  possibility  that  the  representation  of  knowledge  in  a  particular  category  of  things 
in  general  might  assume  the  form  of  a  feature  map  that  is  geometrically  organized  over 
the  corresponding  piece  of  the  brain  motivated  Kohonen  to  do  a  series  of  theoretical 
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investigations  into  this  process.  The  results  of  these  investigations  (presented  in  Chapter 
5  of  Self-Organization  and  Associative  Memory)  led  him  to  believe  that  one  and  the  same 
functional  principle  might  be  responsible  for  self-organization  of  widely  different 
representations  of  information.  Further  investigation  led  to  the  realization  that  many 
functional  properties  inherent  to  self-organizing  systems  are  exhibited  by  neural 
networks.  His  networks  attempt  to  preserve  the  topological  relations  of  the  data  while 
performing  a  dimensionality  reduction  of  the  representation  space  (Kohonen,  1987). 
Hyperspectral  data,  being  of  many  dimensions,  appears  to  be  a  prime  candidate  for 
analysis  by  such  networks. 

3.  Network  Description 

Stan  Openshaw  describes  Kohonen's  self-organizing  map  as  one  of  the  most 
interesting  of  all  the  competitive  neural  nets.  Its  fascination  results  from  the  realization 
that  self-organization  is  a  very  powerful  neural  process  and  that  parts  of  the  brain 
certainly  seem  to  operate  in  a  similar  fashion  (Openshaw,  1994).  It  is  designed  to 
classify  data  consisting  of  many  variables.  As  to  its  ability  to  perform  this  task, 

Openshaw  attributes  the  following  characteristics;  (1)  simplicity  in  algorithmic  design; 
(2)  ability  to  handle  immense  complexity;  (3)  nice  mathematical  properties;  (4)  user 
induced  flexibility;  and  (5)  a  plausible  degree  of  biological  inspiration.  Simplicity  and 
flexibility  make  this  approach  very  attractive. 

Many  variations  of  Kohonen's  original  design  are  in  existence.  Openshaw  offers  an 
excellent  description  of  the  basic  algorithm  which,  with  appropriate  modifications,  is 
reproduced  here. 

A  basic  Kohonen  network  algorithm  can  be  outlined  in  the  following  steps: 

Step  1.  Initialization.  Define  geometry,  dimensionality,  and  size  of  neuron 
array. 

Step  2.  Each  neuron  has  a  vector  of  M  weights.  Set  these  weights  to  some 
initial  value,  usually  random  values. 
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Step  3.  Select  a  data  case  that  also  has  variable  values  and  apply  any  relevant 
measurement  noise  to  the  data. 

Step  4.  Find  whichever  neuron  is  'nearest'  to  the  data  case  under  consideration. 

Step  5.  'Update'  the  vectors  of  M  weights  for  all  the  neurons  in  the  topological 
neighborhood  of  the  winning  neuron,  otherwise  leave  alone. 

Step  6.  Reduce  learning  parameter  and  neighborhood  weights  by  a  very,  very  small 
amount. 

Step  7.  Repeat  steps  3  to  step  6  until  convergence,  typically  a  large  number  of  times. 

Step  8.  Once  training  is  complete,  classify  data. 

This  algorithm  is  computational  simple,  produces  reliable  maps,  can  be  modified  to 
accept  vectors  of  any  dimensionality  and  is  suitable  for  parallel  processing.  The 
particulars  of  the  above  steps  are  as  follows: 
a.  Step  1 

The  process  is  started  by  defining  the  geometry  of  the  map.  This  geometry 
consists  of  the  number  of  variables  contained  in  the  input  data  and  the  size  of  the  array  of 
processing  nodes.  The  processing  nodes  are  similar  to  that  depicted  in  Figure  2. 1 .  The 
number  of  adjustable  weights  associated  with  each  node  is  set  equal  to  the  number  of 
input  variables.  Generally  the  array  is  sized  into  either  one  or  two  dimensions. 

Choosing  the  number  of  nodes  present  in  the  array  is  based  on  the  expected  number  of 
categories  the  data  is  to  be  divided  into  and  experience  using  the  network.  Each 
processing  node  has  the  potential  to  define  a  category,  but  need  not.  As  an  example  take 
a  5  X  5  array  of  nodes.  When  presented  with  a  data  set,  this  array  can  separate  the  data 
into  between  1  and  25  categories.  The  number  of  categories  into  which  any  data  set  is 
divided  depends  on  the  number  of  processing  nodes  and  the  degree  of  similarity 
throughout  that  data  set.  Allowing  arrays  of  different  sizes  to  process  the  same  data  set 
will  provide  the  user  with  valuable  information  about  the  workings  of  the  network  and 
the  data  set.  Figure  2.9  depicts  the  simplest  of  Kohonen  networks,  a  2  x  1  array,  where 
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represents  an  input  vector  of  components.  The  output  of  each  node  equates  to 
whether  or  not  that  input  vector  is  a  member  of  its  category. 

b.  Step  2 

In  order  to  prevent  a  bias,  the  weights  are  initialized  randomly.  Most  high  level 
programming  languages  contain  some  type  of  random  number  generator  which  can  be 
modified  to  initialize  the  weights  to  numbers  comparable  to  the  input  values.  While 
initialization  of  the  weights  to  values  which  are  comparable  to  the  inputs  is  not  critical,  it 
will  decrease  the  number  of  iterations  required  for  training. 

c.  Step  3 

This  step  begins  to  separate  the  neural  network  from  traditional  classification 
schemes.  Normally  data  would  be  presented  in  sequential  order,  each  having  the  same 
weighting.  Some  data  sets  lend  themselves  to  sequential,  even  weighting  analysis  (the 
data  sets  analyzed  in  this  thesis  fall  into  this  category),  others  are  analyzed  more 
accurately  by  equalizing  noise  levels  and  sampling  some  data  cases  more  often,  to  reflect 
the  reliability  of  the  data  (for  applicable  examples  see  Openshaw,  1994).  Analysis  done 
in  this  thesis  was  based  on  random  selection  of  the  data  cases. 

d.  Step  4 

Determining  the  best  matched  neuron  to  represent  the  input  case  is  fairly  simple 
and  extremely  important.  Various  mathematical  methods  exist  for  determining  'neatness' 
or  'similarity',  and  depending  on  the  form  of  data  measurement  and  desired  results, 
different  measures  will  be  appropriate.  Trying  different  methods  offers  insight  into  die 
network  and  the  data  set.  The  two  measures  used  in  this  analysis  are  known  as  the  sum 
of  absolute  differences  (Eq.  2.9)  and  the  correlation  function  (Eq.  2. 10).  In  Eq.  2.9  D 
stands  for  difference  and  the  smallest  D  will  be  declared  the  winner.  In  Eq.  2. 10  C 
stands  for  correlation  and  the  C  which  is  closest  to  1  (1  being  an  exact  correlation,  <) 
being  uncorrelated,  and  -1  being  anti-correlated)  will  be  declared  the  winner.  In  both 
Equations  x.  is  the  input  vector  and  w.  is  the  weight  vector.  In  Eq.  2. 10  x»w  is  the  dot. 
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or  inner,  product  of  the  two  vectors. 


i=i 


(2.9) 


C= 


x*w 


X  w 


(2.10) 


e.  Step  5 

The  adaptive  behavior  of  the  self-organizing  map  results  from  this  part  of  the 
algorithm.  Once  the  winner  has  been  declared,  the  weights  associated  with  that  neuron 
and  its  neighbors  are  adjusted  such  that  the  similarity  between  the  input  and  the  nodes  is 
increased.  The  size  of  the  'neighborhood'  of  the  winning  neuron  is  initially  set  to  some 
'distance'  and  all  neurons  inside  this  distance  are  updated.  The  neighborhood  gradually 
becomes  more  and  more  exclusive  over  time,  such  that,  at  the  end  of  the  training  cycle, 
only  one  neuron  is  being  updated  for  each  input.  This  process  causes  the  network  to 
gradually  become  tuned  to  different  inputs  in  an  orderly  fashion,  almost  as  if  a 
continuous  mapping  of  the  input  space  was  formed  over  the  network  (Openshaw,  1 994). 
This  results  in  parts  of  the  network  closely  resembling  the  different  input  patterns.  This 
ordering  and  smoothing  process  is  extremely  subtle. 

Two  updating  algorithms  were  used  in  the  data  analysis  portion  of  this  thesis.  The 
first  and  simplest  is  known  as  a  block  party,  and  is  defined  by  Kohonen  as  follows: 


w^(f+l)=Wj(r)+a(0[x,-M'.(0]  for  i  e  Nft)  (2.11) 

else 

iv,(f+l)=M'.(0  (2.12) 

where  ^.(t)  is  the  weight  vector  for  any  neuron  i  which  lies  within  the  neighborhood  set 
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iV^COof  the  winning  neuron,  w.(t  +1)  is  the  updated  weights  for  this  neuron  i,  x.  is  the 
vector  of  values  for  the  input  data  case,  and  a(t)  is  a  training  constant  (also  known  as  the 
'learning  rate')  which  establishes  the  size  of  the  update.  The  training  constant  is  typically 
started  at  some  value  between  0  and  1,  and  decreases  with  time.  This  training  process  is 
illustrated  in  figures  2. 1 0  and  2. 1 1 .  This  portion  of  the  analysis  (which  will  be  discussed 
in  more  detail  in  Chapter  4)  was  set  up  to  demonstrate  the  network's  ability  to  correctly 
classify  four  'made  up'  spectra  having  dramatically  differing  characteristics.  Figure  2. 10 
depicts  the  four  input  spectra.  Figure  2. 1 1  is  a  time  lapsed  plot  of  the  four  sets  of 
weights.  Immediately  noticeable  is  that  the  final,  trained  weights  very  closely  resemble 
the  inputs.  Closer  investigation  of  this  plot  reveals  the  approximate  time  at  which  the 
updating  process  transitioned  to  single  node  updating  per  input.  This  is  evident  by  the 
constancy  of  the  weights  after  the  transition. 

The  second,  and  slightly  more  sophisticated,  updating  algorithm  is  one  which  makes 
the  learning  rate  dependent  on  the  distance  (d)  from  the  winning  neuron.  In  effect  a(t) 
becomes  a(d,t).  A  useful,  simple  method  of  accomplishing  this  is  a  Gaussian  function  of 
distance  from  the  winning  neuron.  Muller  and  Reinhardt  use  the  following  function 
(Openshaw,  1994): 


a(</,0=exp[-<^^/(2P(0)] 

Where  P(t)  is  the  size  of  neighborhood  at  time  t. 

f.  Step  6 

In  order  to  achieve  stability,  the  training  parameters  (ie.  learning  rate  and 
neighborhood  size)  must  be  decreased  slowly  with  time.  This  process,  known  as 
simulated  annealing,  and  its  importance,  has  already  been  described  in  section  A  of  this 
chapter.  The  algorithm  used  in  this  thesis  to  accomplish  this  process  employed  a  linear 
decrease  according  to  iteration  number.  These  Equations  are  as  follows: 
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a(f)=a  ) 

'  o  n  max^ 


(2.14) 


4(f)=<.(l-/y/„)  (2.15) 

where  a  is  the  initial  training  constant  /  is  the  iteration  number,  I  is  the  total 
number  of  iterations,  and  is  the  initial  neighborhood  defining  distance.  Selecting 
initial  values  for  a  and  is  best  gained  through  experience.  The  training  constant  is 
normally  constrained  to  0  <  a^<l. 

g.  Step  7 

The  optimum  number  of  iterations,  which  defines  the  number  of  training  cycles, 
depends  on  the  number  of  neurons,  the  size  of  the  data  set,  and  the  noise  level  of  the  data. 
No  empirical  relationships  exist  to  define  this  number.  Again  experience  with  the 
network  offers  the  best  insight.  For  small  data  sets  and  few  neurons  (ie.  16  inputs,  50 
variables  per  input,  little  noise,  4  neurons)  1000  iterations  may  be  enough,  while  for 
large  data  sets  and  many  neurons  (ie.  50,000  inputs,  70  variables,  some  noise,  50 
neurons)  2,000,000  iterations  may  not  be  enough. 

h.  Step  8 

Finally,  once  all  training  is  complete,  the  inputs  are  classified.  Each  input  is 
compared  to  each  neuron  according  to  some  measure  of  similarity,  normally  the  same 
measure  used  in  training.  The  input  is  declared  a  member  of  the  winning  neuron's 
category. 

Appendix  A  of  this  paper  contains  a  Kohonen  network  written  in  IDL  (Interactive 
Data  Language).  It  is  based  on  the  FORTRAN  program  given  in  appendix  n  of 
Openshaw,  1994.  Chapter  4  will  analyze  the  applicability  of  this  network  to  classifying 
hyperspectral  imagery. 


23 


in.  HYPERSPECTRAL  IMAGERY 


This  Chapter  will  begin  with  a  look  at  hyperspectral  imagery,  its  attributes  and 
impediments,  and  will  conclude  with  a  discussion  of  the  two  instruments  used  to  collect 
the  data  analyzed  in  Chapter  IV. 

A.  OVERVIEW 

Remote  sensing  is  defined  as  the  acquisition  of  information  about  an  object  without 
being  in  physical  contact  (Elachi,  1987).  Information  is  acquired  by  detecting  and 
measuring  changes  that  the  object  imposed  on  the  surrounding  environment.  Areas  of 
study  in  which  measurable  changes  occur  include  acoustics,  the  electromagnetic 
spectrum  and  perturbations  of  the  gravity  field.  The  term  'remote  sensing'  is  most 
commonly  used  in  connection  with  electromagnetic  techniques  of  information  acquisition 
(Elachi,  1987).  These  techniques  cover  the  entire  electromagnetic  spectrum. 

Because  all  materials  reflect,  absorb,  or  emit  photons  in  ways  characteristic  of  their 
molecular  makeup,  a  high  resolution  trace  of  the  intensity  of  the  transmitted,  reflected  or 
emitted  radiation  versus  wavelength  forms  a  graphical  record  unique  to  a  given  material 
(Rinker,  1990).  Hyperspectral  imagery,  also  known  as  imaging  spectrometry,  is  a  form 
of  remote  sensing  that  attempts  to  reproduce  this  unique  graphical  record.  It  refers  to  the 
imaging  of  an  area  or  "scene"  over  a  large  number  of  discrete,  contiguous  spectral  bands 
such  that  the  image  contains  a  complete  reflectance  spectrum. 

Hyperspectral  imaging  is  a  follow  on  to  multispectral  imaging,  the  difference  being 
the  spectral  resolution  of  the  collected  data.  The  primary  limitation  associated  with 
multispectral  sensors  is  an  inability  to  distinguish  between  certain  materials  because  of 
poor  spectral  resolution.  The  spectral  reflectance  and  emittance  characteristics  for 
surface  materials,  which  are  determined  by  electronic  and  vibrational  energy  states 
within  the  materials,  are  usually  too  highly  structured  to  be  resolved  at  coarse  spectral 
resolutions  (Vane,  1985).  Many  of  these  identifying  features  occur  over  bandwidths  on 
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the  order  of  20-40  nm.  Hyperspectral  sensors  nominally  collect  data  in  wavelength 
bands  of  10  nm  or  less,  thus  allowing  for  the  identification  of  such  features  and  in  turn 
identification  of  the  material.  Over  the  last  two  decades  parallel  advancements  in  optics, 
microelectronics  and  computer  technology  have  allowed  hyperspectral  collection  of 
imagery  data  to  mature  to  the  point  where  it  can  provide  detailed  information  in  many 
remote  sensing  environments. 

Figure  3.1  illustrates  the  potential  of  hyperspectral  imagery  to  distinguish  materials. 

It  is  a  plot  of  the  percent  reflectance  vs.  wavelength  of  two  fabrics  (A  and  C)  and  a  green 
leaf  (B).  It  is  probable  that  a  broadband  instrument  would  be  able  to  identify  fabric  C 
from  the  green  leaf  but  would  have  great  difficulty  with  fabric  A.  The  spectral 
reflectance  of  Fabric  A  and  the  green  leaf  are  very  similar,  but,  when  the  two  spectra  are 
over-laid  a  noticeable  difference  is  present.  This  difference  could  only  be  distinguished 
by  a  narrow  band  instrument.  The  value  of  hyperspectral  imaging  systems  lies  in  their 
ability  to  collect  complete  reflectance  spectrum  for  each  picture  element  (pixel)  in  the 
image.  This  should  allow  for  accurate  identification  of  each  pixel  and  comparison 
between  pixels,  thus  producing  a  more  accurate  depiction  of  whatever  is  being  imaged. 

Increased  resolution  does  not  come  without  a  price.  When  compared  to  current 
broadband  multispectral  sensor  systems,  hyperspectral  sensors  produce  on  the  order  of  10 
to  30  times  the  amount  of  data  per  pixel.  Storing  and  processing  this  massive  amount  of 
data  is  difficult.  Most  of  today's  attempts  at  analyzing  hyperspectral  imagery  involve 
reducing  the  dimensionality  of  the  data.  Such  reduction  methods  inevitably  lose  some  of 
the  information.  This  paper  attempts  to  develop  a  method  of  hyperspectral  data  analysis 
that  is  performed  without  reduction  of  the  dimensionality  of  the  data. 

B.  SENSORS 

This  paper  analyzes  two  data  sets,  collected  using  different  sensors.  A  brief 
examination  of  these  instruments  follows. 
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1.  Advanced  Airborne  Hyperspectral  Imaging  System  (AAHIS) 

The  AAHIS  sensor  was  originally  developed  by  Science  Applications  International 
Corporation  (SAIC),  San  Diego,  California.  The  sensor  was  upgraded  in  a  cooperative 
effort  with  SETS  Technology  Inc.,  Honolulu,  Hawaii.  The  basic  block  diagram 
depicting  the  AAHIS  major  components  is  shown  in  Figure  3.2.  The  AAHIS  sensor  was 
designed  for  high  signal-to-noise  ratio  performance  based  on  it's  intended  use  in  maritime 
applications  which  involve  low  reflectance  environments  (e.g.  5%  reflectance).  Table 
3. 1  lists  the  AAHIS  performance  characteristics.  The  table  also  lists  the  sensor's 
signal-to-noise  ratio  performance  over  a  land-based  (20%  reflectance)  environment. 


Sensor  Parameters 

Nominal  Value 

Useful  Spectral  Range  (nm) 

440-870 

Number  of  Spectral  Channels 

72 

Nominal  Bandwidth  (nm) 

*  4  channels  summed  on  chip,  2  in  software 

12.4 

Cross  Track  IFOV  (mradians) 

*  2  channels  summed  in  software 

1.1 

Along  Track  IFOV  (mradians) 

1.0 

Spacial  pixel  number 

190 

Swath  Width  (degrees) 

11.4 

Signal-to-Noise  (5  %  reflectance  target) 

*  50  Hz  frame  rate 

200-400 

Signal-to-Noise  (20  %  reflectance  target) 

*  50  Hz  frame  rate 

500-850 

Table  3.1  AAHIS  Sensor  Characteristics 
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2.  Spatially  Modulated  Imaging  Fourier  Transform  Spectrometer  (SMIFTS) 

The  second  instrument  used  to  collect  data  is  a  second  generation  SMIFTS  type 
spectrometer  known  as  FTVHSI  (Fourier  Transform  Visible  Hyperspectral  Imager). 
SMIFTS  is  a  cryogenically  cooled,  imaging ,  spatially  modulated  Fourier  transform 
interferometer  spectrometer  for  spatially  resolved  spectral  imaging  (Lucey,  et  al.,  1993). 
It  offers  several  positive  characteristics  which  include  a  wide  field  of  view,  simultaneous 
measurement  of  all  spectral  channels,  broad  wavelength  range  and  moderate  spectral 
resolution.  The  FTVHSI  was  built  by  the  Florida  Institute  of  Technology  and  offers 
better  spatial  resolution,  spectral  resolution  and  contrast  (Often,  et  al.,  1995).  This 
section  will  begin  with  a  discussion  of  the  theory  behind  Fourier  transform  spectroscopy 
and  the  Michelson  interferometer  and  will  conclude  with  a  description  of  the  FTVHSI 
instrument  and  it  attributes. 

a.  Fourier  Transform  Spectroscopy 

Fourier  transformation  of  a  sampled  interference  pattern  to  obtain  the  spectrum  of 
the  input  source  is  the  basis  for  Fourier  transform  spectroscopy,  a  decades  old  discipline 
which  principally  utilizes  Michelson  interferometers  to  make  spectral  measurements.  All 
of  the  principles  which  apply  to  Michelson  interferometers  also  apply  to  SMIFTS. 

(Lucey,  et  al.,  1993) 

The  Michelson  interferometer  operates  by  splitting  an  electromagnetic  wave  into 
two  optical  paths,  one  of  which  can  be  adjusted  in  length.  Constructive  and  destructive 
interference  occurs  when  the  electric  fields  recombine  at  the  detector.  The  sequence  of 
intensity  measurements  for  different  path  distances  includes  the  autocorrelation  function 
of  the  electric  field.  The  Fourier  transform  of  this  autocorrelation  function  is  the 
spectrum  of  the  source,  interferometer  and  detector  system.  Figure  3.2  depicts  the 
standard  model  of  a  Michelson  interferometer,  with  S  representing  the  input  source.  A 
given  ray  is  split  in  two  by  the  half-silvered  mirror;  the  two  halves  are  reflected  at 
mirrors  Mj  and  and  then  recombined  at  the  half-silvered  mirror.  Mirrors  Afj  and 
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M,  are  positioned  at  distances  which  differ  by  d.  This  creates  the  phase  variation.  The 
compensator  plate  is  often  inserted  to  make  the  optical  paths  symmetrical,  so  that  each 
ray  passes  through  the  same  thickness  of  glass  (Klein,  1970). 

The  ensuing  derivation  of  formulas  for  interference  spectroscopy  closely  follows 
that  which  is  presented  in  Klein,  1970. 

An  interference  spectrometer  is  a  Michelson  interferometer  modified  to  use 
collimated  light  as  shown  in  Figure  3.3.  An  image  of  the  aperture  in  front  of  the  source 
S  is  formed  in  the  focal  plane  of  Lens  at  the  detector  D.  The  lenses  ensure  that  the 

light  is  approximately  collimated  and,  therefore  the  equation  for  the  phase  shift  for  light 
of  wavelength  X  or  frequency  v  reflected  from  the  two  mirrors  is  given  by: 


^v_ 

A  c 


(3.1) 


where  d  is  the  effective  separation  of  and  . 

For  a  single  frequency  input  Vj ,  the  time-averaged  intensity  at  the  detector  can  be 
written  as  (Klein,  1970): 


COS 


4TZV^d 
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=  /  (  1  +  xcos 

O  ^ 


(3.4) 


4Tiv,4i 

- 


where  I,  and  L  are  the  intensities  in  each  beam  at  the  detector  and  1=1.  +L .  An  ideal 

12  o  1  Z 

beam  splitter  will  give  /j  =/2and  x=l. 

If  this  concept  is  extended  to  the  general  case,  where  the  spectral  intensity  is 
described  by  a  continuous  function  /(v)  and  /(v)  dv  represents  the  intensity  in  the  beam 
in  the  frequency  range  v  to  v+  dv,  then  Eq.  3.4  becomes: 


//T)  =  J  /(V)  (1  +  cos2ti:vt) 


where  the  time  constant  x  is  given  by 


This  result  consists  of  an  average  term 


I.  -  I  /(V) 
0 


plus  an  oscillatory  term 


c  =  /  «'’) 
0 


cos  Invx  dv 


and  can  be  written  in  the  form 


//X)  =/,[!+  y(x)] 
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where 


Y(^)  =  |/(v) 


cos  27UVT  dv 


is  the  normalized  oscillatory  term  and  where 


(3.10) 


o 


(3.11) 


is  the  normalized  spectral  distribution  function.  is  the  output  of  the  interferometer 
and  is  known  as  an  interferogram.  Eq.  3.10  expresses  yC"^)  as  the  Fourier  transform  of 
P  (V) ;  we  can  then  recover  P  as  a  function  of  frequency  if  we  can  determine  y  as  a 
function  of  t  for  all  t,  by  Fourier  inversion  (Klein,  1970). 

Much  of  the  literature  on  Fourier  Transform  Spectroscopy  talks  of  the 
relationship  between  the  interferogram  and  the  autocorrelation  function.  What  follows  is 
a  derivation  given  by  Klein,  1970,  which  illustrates  this  relationship. 

Suppose  that  the  beam  splitter  in  the  interferometer  performs  ideally.  Then,  the 
detector  responds  to  the  square  of  the  electric  field  E(t)  from  Mj ,  and  the  delayed  return 
from  ,  E(t-z).  The  detector  signal  is  then  proportional  to  the  time  average  of  the  total 
field  squared  and,  neglecting  constants,  is 


I,  =  [  E{t)  +  E{t-x)f  =  E{tf  +  £(t-t)2  +  l[E{t)  Eit-xy\  (3.12) 


The  time  averages  are  defined  as  follows; 


For  large  T  and  stationary  E  fields  the  integral  is  independent  of  .  This  is  equivalent 
to  taking  the  limit  as  7’-+  «>. 

The  terms  {£(r)^)and  {Eit-x^)  represent  the  respective  intensities  received  in  the 
beams  from  Mj  and  ,  and  for  a  steady  source  are  each  equal  to  where  is  the 
total  intensity  at  the  detector  if  interference  does  not  occur.  The  term  l{E{t)E{t-x)) 
contains  all  the  interference  effects.  If  it  is  normalized  by  dividing  by  and  we  define 


<£(()'> 


(3.14) 


then  we  have 


/,=/[ln(t)]  (3.15) 

This  last  equation  has  the  same  form  as  Eq.  3.9  and  is  deliberately  written  using 
the  same  function  The  two  definitions  of  y(t),  Eqs.  3.10  and  3.14,  are  equivalent 
(Klein,  1970  offers  a  rigorous  proof  of  this  in  Appendix  B).  When  written  in  the  form  of 
Eq.  3.14  y(t)  is  called  the  normalized  autocorrelation  function  of  the  signal  E(t).  It 
describes  the  mean  correlation  of  E  with  itself  at  an  earlier  time.  (Klein,  1970) 

The  equivalence  of  the  two  expressions  for  y(t),  namely  the  Equation 

=  rp(v)  cos  271VT  dv  (3.16) 

mf)  { 

is  called  the  Wiener-Khintchine  theorem.  It  says  that  the  normalized  autocorrelation 
function  of  a  signal  is  the  cosine  Fourier  transform  of  its  spectral  distribution  function. 
The  spectral  distribution  of  a  signal  can  be  obtained  by  measuring  its  electric  field  for 
some  period  of  time. 
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b.  FTVHSI  Attributes. 

A  SMIFTS  type  instrument  employs  three  major  optical  subsystems:  a  Sagnac 
interferometer  that  produces  the  spatially  modulated  interferogram;  a  Fourier  transform 
lens  that  frees  the  spectral  properties  of  dependence  on  aperture  geometry  and  allows  the 
wide  field  of  view;  and  a  cylindrical  lens  that  reimages  one  axis  of  the  input  aperture 
onto  the  detector  array  providing  the  one  dimension  of  imaging.  Figure  3.5  is  schematic 
depiction  of  a  SMIFTS  type  instrument.  (Lucey,  et  al.  1993) 

The  SAGNAC  interferometer  is  known  as  a  triangle  path  or  common  path 
interferometer.  The  similarities  between  it  and  the  Michelson  interferometer  are  obvious. 
Again,  the  phase  change  is  induced  by  displacement  of  one  of  the  mirrors. 

The  spectral  range  of  any  interferometer  is  based  on  the  spectral  response  of  the 
detector  array  and  the  transmission/reflection  characteristics  of  the  optics.  In  this  case,  a 
detector  made  of  silicon  is  used  to  sample  the  resultant  interference  pattern.  The  spectral 
response  range  of  this  detector  is  between  270  nm  and  1 140  nm.  The  instrument’s 
response  range  is  limited  to  between  450  nm  and  1040  nm  due  to  instrument 
configuration  (Often,  et  al.,  1995). 

The  following  table  compares  SMIFTS  to  other  hyperspectral  instruments. 
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Resolution 

X/AX 

Wavelength 

Range 

Moving  Parts 

Simultaneous 

Acquisition 

Throughput 

SMITTS 

o 

1 

O 

Broad 

NO 

YES 

very  high 

GRATING 

10^  -  10* 

Narrow 

NO 

YES 

low 

PRISM 

10^  -  10* 

Narrow 

NO 

YES 

low 

Michelson 

yc 

o 

1 

o 

Broad 

YES 

NO 

very  high 

Electronic 

Filter 

10^ 

Narrow 

NO 

NO 

very  high 

Mechanical 

Filter 

to 

o 

1 

o 

Broad 

YES 

NO 

very  high 

Mask  Filter 

10^ 

Narrow 

NO 

YES 

very  high 

Table  3.2  Hyperspectral  Sensor  Technology  Characteristics 
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rV.  ANALYSIS  AND  RESULTS 


In  analyzing  the  applicability  of  the  Kohonen  Self-Organizing  Map  to  hyperspectral 
imagery,  a  systematic  increase  in  data  complexity  was  followed.  Several  times  during 
the  analysis  the  map's  ability  to  classify  hyperspectral  data  seemed  questionable. 
Understanding,  gained  at  lower  levels  of  data  complexity,  provided  knowledge  of  what 
the  network  was  doing,  why  it  was  doing  it  and  what  changes  could  be  made  to  correct 
the  problem.  What  follows  is  an  outline  of  this  process  and  the  results  and  conclusions 
drawn  from  it. 

A  version  of  the  program  used  in  this  analysis  is  presented  in  Appendix  B.  It  is 
written  in  DDL  and  based  on  Openshaw's  FORTRAN  code  (Openshaw,  1994).  The  code 
is  easily  adaptable  to  almost  any  computer  language.  All  analysis  was  done  on  a  Silicon 
Graphics  work  station. 

A.  SIMPLE,  TEST  SPECTRA 

To  begin  the  process,  four  line  spectra  were  created  to  represent  the  spectral 
reflectivity  of  four  fictitious  materials.  These  spectra,  depicted  in  Figure  2.10,  contain 
fifty  data  points.  Each  spectra  was  purposely  created  with  a  large  degree  of  dissimilarity. 
The  idea  was  that  if  the  network  could  not  discern  between  these  spectra  there  would  be 
no  reason  to  continue  the  analysis.  Each  spectra,  which  represent  a  category  of  material, 
was  then  replicated  four  times  so  that  each  of  these  four  categories  contained  four 
members  for  a  total  of  sixteen  inputs.  The  inputs  were  then  fed  into  networks  of  varying 
sizes  and  dimensions,  beginning  with  small  number  of  nodes  and  then  increasing,  in  an 
attempt  to  correctly  categorize  the  spectra.  Noise  levels  imposed  on  the  spectra,  learning 
rate,  neighborhood  size  and  number  of  iterations  were  varied  to  observe  their  effects  on 
the  outcome.  Noise  was  generated  by  the  addition,  and  in  some  cases  subtraction,  of 
uniform  random  numbers  ranging  between  0  and  1  multiplied  by  a  constant  to  produce 
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the  desired  noise  level. 


1.  Two  Node  Network 

The  two  node  network  is  the  smallest  non  trivial  network.  The  parameters  used  in 
each  run  are  presented  in  tabular  form  in  Table  4.1,  where  d  is  the  neighborhood  size 
parameter,  a  is  the  learning  constant,  N.C.  is  the  number  of  categories  into  which  the 
network  divided  the  inputs,  noise  level  is  given  in  percentage  and  number  correct  is 
subjectively  determined  by  the  author.  The  network  consistently  divided  the  inputs  into 
two  categories  each  containing  eight  members.  Figure  2.10  also  depicts  how  this 
network  grouped  the  inputs.  The  spectra  represented  by  continuous  lines  were  grouped 
together  as  were  the  spectra  pictured  as  diamonds.  Initially,  the  basis  for  this  grouping 
scheme  was  not  known  (it  was  latter  determined  to  be  based  on  average  spectral 
intensity). 


MATRIX 

SIZE 

ITERATIONS 

d 

a 

N.C. 

MEM 

NOISE 

APPLIED 

NUMBER 

CORRECT 

2  X  1 

1000 

B 

.5 

2 

8 

0 

16 

2  X  1 

1000 

B 

B 

2 

8 

10 

16 

2  X  1 

10000 

B 

B 

2 

8 

10 

16 

2  X  1 

10000 

B 

B 

2 

8 

10  to  20 

16 

Table  4. 1  Results  of  the  Two  Node  Network 


During  the  training  process  the  network  attempts  to  replicate  the  inputs  which  are 
assigned  to  each  node,  hence  it  is  interesting  and  insightful  to  look  at  a  plot  of  the  final 
weights  associated  with  each  node.  Figure  4. 1  is  such  a  plot.  In  this  network  there  are 
two  nodes  and  fifty  weights  per  node.  The  x-axis  in  divided  into  100  increments,  the 
first  fifty  of  which  belong  to  node  #  1,  and  so  forth  (all  subsequent  plots  of  this  type 
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follow  this  format).  Since  there  are  four  categories  and  only  two  nodes,  an  accurate 
replication  of  each  category  is  not  possible.  The  network  combines  the  shape  of  two  of 
the  spectra  into  a  single,  'average'  spectra.  Anytime  too  few  nodes  are  used,  some  type  of 
averaging  must  take  place.  The  results  of  the  two  node  network  were  encouraging 
enough  to  proceed  to  the  next  logical  network  size,  four  nodes. 

2.  Four  Node  Network 

Knowing  that  the  data  set  consisted  of  four  categories,  high  expectations  were  held 
for  the  four  node  network.  If  this  network  could  not  produce  a  four  group,  four  member 
result,  then  the  applicability  of  the  Kohonen  network  would  appear  doubtful.  Because  of 
the  anticipated  importance  of  this  network  size,  hundreds  of  runs  were  executed.  Table 
4.2  is  a  summary  of  the  most  interesting. 


RUN 

# 

MATRIX 

SIZE 

ITERATIONS 

d 

B 

N.C 

MEM 

NOISE 

NUMBER 

CORRECT 

1 

4  X  1 

10000 

D 

B 

3 

448 

0 

12 

2 

4  X  1 

10000 

.6 

.5 

4 

4444 

0 

16 

3 

4x1 

10000 

.4 

.5 

3 

448 

10 

8 

4 

4x  1 

30000 

.6 

.3 

■1 

4444 

10 

16 

5 

2x2 

10000 

D 

D 

3 

448 

10 

8 

6 

2x2 

30000 

.5 

.2 

4 

4444 

10 

16 

7 

4  X  1 

10000 

D 

D 

4 

4444 

10  to  20 

16 

8 

1x4 

10000 

B 

.3 

■■ 

4444 

up  to  50 

16 

9 

1  x4 

10000 

1 

1 

1 

5344 

up  to  70 

15 

Table  4.2  Results  of  the  Four  Node  Network 


The  first  configuration  selected  was  a  linear  4x1  matrix  of  nodes.  Its  first  run,  run 
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#1  in  Table  4.2,  did  not  produce  the  expected  four  groups  of  four  result.  This  run 
resulted  in  three  categories  with  eight  members  in  the  third  category.  Four  inputs  were 
miss-classified  (miss-classification  being  defined  as  placing  an  input  in  the  wrong 
category).  Figure  4.2  is  a  plot  of  the  final  weights.  Inspection  of  this  plot  reveals  that 
only  two  of  the  input  spectra  had  been  replicated  accurately.  At  this  point,  the 
assumption  was  made  that  the  network  was  stuck  in  a  local  minima.  Several  additional 
runs  were  made,  each  with  some  modification  to  the  original  parameters.  The  problem 
was  corrected  by  a  simultaneous  change  to  a  larger  neighborhood  size  (an  increase  in  the 
learning  parameter  from  0.4  to  0.6  was  sufficient)  and  an  increase  in  the  learning 
constant  (from  0.4  to  0.5).  In  terms  of  the  simulated  annealing  process,  this  amounted  to 
shaking  the  network  harder  and  allowing  it  to  get  out  of  the  local  minima.  As  can  be 
seen  in  Figure  4.3,  run  #2  identified  the  four  spectra  correctly. 

As  has  been  mentioned  previously,  the  training  process  is  an  attempt  by  each  node  to 
take  on  the  characteristics  of  one  category  of  inputs.  Figure  2.1 1  illustrates  this  process. 
It  is  a  time  stacked  plot  of  the  weights  associated  with  run  #2.  Two  characteristics  of  the 
network  are  readily  apparent  in  this  plot: 

1)  As  the  energy  of  the  network  decreases,  it  will  stabilize  in  some  constant 
configuration. 

2)  Each  node  is  capable  of  accurate  replication  of  an  input  group. 

Once  positive  results  were  obtained,  the  effects  of  higher  levels  of  noise  were 
examined.  Figure  4.4  is  a  plot  of  the  four  spectra  with  20%  noise  added.  The  network 
continued  to  produce  an  accurate  classification  up  to  50%  noise  (see  runs  7-10).  Figure 
4.5  is  a  plot  of  the  final  weight  values  of  run  #  7.  Notice  that  some  of  the  noise  remains. 
Noise  levels  above  50%  consistently  caused  miss-classification  of  one  or  more  of  the 
inputs. 

The  next  step  was  to  examine  the  other  two  possible  four  node  configurations,  1x4 
and  2x2.  The  1x4  node  array  produced  results  which  were  not  noticeably  different 
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from  the  4  X  1 .  The  2^2  array,  while  again  producing  similar  results,  required  repeated 
adjusting  of  the  algorithm.  On  several  runs,  for  reasons  which  are  yet  to  be  understood, 
one  of  the  nodes  was  quickly  eliminated  from  the  updating  process  (see  Figure  4.6 
weights  150-200).  Node  number  4  was  dropped  from  the  training  process  and  eight 
inputs  were  placed  into  category  3 .  The  network  often  appeared  to  be  stuck  in  what  has 
been  previously  described  as  a  local  minima.  In  all  cases,  once  the  appropriate 
parameters  had  been  manipulated,  the  network  performed  a  correct  classification.  Figure 
4.7  is  the  plot  of  final  weights  for  the  same  2x2  array.  In  this  case  the  problem  was 
solved  by  increasing  the  number  of  training  iterations  from  10000  to  30000  and 
decreasing  the  learning  constant  from  0.4  to  0.2.  Often  several,  and  sometimes  many, 
adjustments  had  to  be  performed  prior  to  network  convergence  on  the  correct 
classification. 

By  the  end  of  the  four  node  network  examination,  a  better  feel  for  what  is  meant  by 
the  terms  'local  minima'  and  'simulated  annealing'  had  been  gained  and  what  effects 
adjusting  key  parameters,  such  as  learning  rate  and  iterations,  had  on  these  terms. 

Experience  with  the  four  node  network  produced  two  contradictory  opinions. 

1 .  Proper  'tweaking'  of  the  network  could  lead  to  correct  classification  of  spectral 
data. 

2.  Any  tool  which  has  to  be  continuously  adjusted  in  order  to  obtain  useful  results 
will  have  limited  practical  applications. 

With  this  in  mind,  testing  of  higher  dimension  arrays  commenced. 

3.  Arrays  of  Larger  Size 

Arrays  of  larger  size  were  considered  and,  as  with  the  previous  size  arrays,  many  runs 
were  conducted.  An  illustrative  batch  is  presented  in  Table  4.3.  As  a  group  these 
networks  performed  admirably.  With  no  noise  added  to  the  inputs,  all  configurations 
correctly  classified  the  data.  As  noise  levels  increased,  the  networks  over-classified  the 
data  (as  opposed  to  a  miss-classification,  over-classification  is  defined  as  creating  a 
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separate  category  where  one  is  not  desired).  Over-classification  increases  as  the  number 
of  nodes  and  noise  levels  increase,  such  that  a  5  x  5  matrix  classified  the  sixteen  inputs, 
containing  10%  to  20%  noise,  into  sixteen  separate  categories.  A  positive  note  was  that 
miss-classifications  did  not  occur.  Figure  4.8  depicts  the  final  weights  of  run  #7.  The 
replication  process  is  quite  evident. 

For  these  networks  increasing  the  number  of  iterations  above  10,000  had  almost  no 
effect  on  the  outcome  Small  variations  in  the  other  parameters  seemed  to  have  little 
effeci;  either. 


RUN 

# 

MATRIX 

SIZE 

ITERATIONS 

d 

a 

N.C. 

MEM 

NOISE 

NUMBER 

CORRECT 

1 

4x2 

10000 

D 

D 

4 

4444 

10 

16 

2 

4x2 

20000 

.4 

.4 

5 

43144 

up  to  40 

15 

3 

8x1 

10000 

.4 

A 

5 

44431 

10 

15 

4 

8x1 

10000 

.4 

A 

4 

4444 

0 

16 

5 

2x4 

10000 

.4 

A 

4 

4444 

10 

16 

6 

4x4 

10000 

.4 

A 

n 

4444 

0 

16 

■ 

4x4 

10000 

■ 

■ 

10 

— - 

10 

? 

1  ^ 

5x5 

10000 

□ 

16 

1 

10  to  20 

? 

Table  4.3  Results  of  Larger  Size  Arrays 


B.  SIMPLE  IMAGES 

Having  obtained  some  confidence  in  the  network's  ability  to  classify  spectra,  the  next 
step  was  to  determine  its  ability  to  work  with  an  image.  For  this  purpose  two  simple 
variable  images  were  created. 
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1.  Black  and  White  Image 

The  first  test  image,  Image  1,  is  depicted  in  Figure  4.9.  It  consists  of  four  small  light 
grey  squares  imposed  on  a  large  black  square  background.  The  image  was  created  by 
setting  the  background  equal  to  a  small  number  when  compared  to  the  inner  square  fidds 
(in  this  case  6  and  200  respectively).  The  image  is  displayed  in  a  grey  scale  ranging 
from  0  for  black  to  255  for  white.  This  image  was  then  read  into  networks  consisting  of 
between  two  and  sixteen  nodes.  Each  run  produced  the  same  result,  an  accurate 
classification  of  the  image  into  two  groups  (see  Figure  4. 10).  Changes  in  iterations, 
learning  rate  and  neighborhood  size  had  no  effect  on  the  output. 

Noise  was  then  added  to  the  data  which  changed  the  image  to  look  like  Figure  4.1T 
The  two  node  array  correctly  classified  the  image,  for  noise  levels  up  to  50%,  into  two 
categories  (see  Figure  4. 12).  This  result  was  expected,  given  that  the  network  was 
forcing  the  image  into  two  categories.  At  noise  levels  above  50%  members  of  the  inner 
squares  could  end  up  being  closer  in  magnitude  to  the  black  exterior.  Figure  4.13 
exhibits  the  result  of  55%  noise  added  to  the  upper  right  square.  As  before,  when  more 
nodes  were  used  to  classify  a  noisy  image,  an  over  classification  occurred. 

2.  Variable  Grey  Image 

Image  2,  depicted  in  Figure  4. 14,  closely  resembles  Image  1 .  It  was  created  by 
changing  the  values  in  three  of  the  black  squares  to  50,  100,  and  150,  respectively. 
Classification  of  Image  2  by  networks  consisting  of  five  or  more  nodes  correctly 
categorized  the  image  into  five  separate  groups.  A  five  node  network  output  appears  in 
Figure  4.15.  Close  analysis  of  these  two  figures  reveals  that  the  inner  squares  have  not 
kept  their  same  relative  brightness.  This  reveals  that  the  network  is  assigning  values  to 
groups  according  to  their  relative  placement  in  the  nodal  array  and  not  according  to  any 
specific  characteristic  of  the  inputs  associated  with  that  group. 

A  two  node  array  produced  the  image  depicted  in  Figure  4. 16.  Adding  noise  to 
Image  2  produced  results  consistent  with  those  described  in  the  analysis  of  Image  1. 
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C.  VARIABLE  INTENSITIES  OF  SIMPLE  TEST  SPECTRA 

Spectral  reflectivity  curves,  while  maintaining  a  given  shape,  vary  as  the  intensity  of 
the  illuminating  source  changes.  It  is  important  that  a  classification  scheme  be  able  to 
group  two  identical  objects,  one  in  shade  and  one  in  sunlight,  as  a  member  of  the  same 
category.  In  order  to  test  the  Kohonen  network's  ability  to  classify  spectra  according  to 
curve  shape  and  not  intensity,  the  spectra  described  in  part  A  of  this  chapter  have  been 
modified  such  that  two  members  of  each  group  exist,  with  each  being  offset  by  some 
constant  from  the  other.  Figure  4. 17  plots  the  resulting  spectra.  Ideal  classification 
would  result  in  four  categories  of  two  members,  each  being  of  the  same  shape. 

Many  network  configurations  were  put  forward  in  an  attempt  to  classify  this  data  set 
correctly,  with  none  yielding  the  desired  results.  Furthermore,  manipulation  of 
iterations,  learning  rate,  and  neighborhood  size  proved  to  be  futile.  Unless  the  offset  was 
comparatively  small,  the  network  grouped  the  spectra  according  to  an  average  intensity. 
A  change  in  how  the  winning  neuron  was  selected  was  in  order. 

All  runs  up  to  this  point  in  the  analysis  had  been  conducted  using  the  sum  of  absolute 
differences  (Eq.  2.9)  as  the  method  of  determining  the  most  similar  neuron.  It  seems  to 
reason  that  adding  up  the  absolute  distances  between  each  corresponding  point  would 
equate  to  a  comparison  of  average  intensity.  A  method  of  determining  curve  shape 
similarity  was  needed. 

The  degree  of  similarity  of  any  two  vectors  in  space  is  determined  by  taking  the  dot 
product  between  such  vectors.  With  this  in  mind,  Eq.  2.10  was  selected  for  the  described 
purpose.  The  results  of  this  change  were  impressive.  Not  only  were  the  spectra  properly 
classified  according  to  curve  shape,  but  the  time  to  complete  a  training  iteration  and 
classify  inputs  was  decreased  dramatically.  With  these  results  in  hand,  real  data  were 
selected  for  analysis. 
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D.  REAL  DATA 

1.  Camouflage  Image 

The  first  set  of  data  set  was  taken  by  Bruce  Raffert  and  Glen  Sellars  as  part  of  work 
being  done  by  Kestrel  Corp.  in  conjunction  with  Phillips  Laboratory.  A  SMIFTS  type 
hyperspectral  instrument,  built  by  Raffert,  was  used  to  collect  the  spectral  reflectivity  of 
a  piece  of  cloth  camouflage.  The  cloth  was  placed  fifty  feet  from  the  instrument,  draped 
over  a  piece  of  wooden  board  and  leaned  against  a  metal  pole.  The  background  consisted 
of  short  grass.  The  image  was  collected  sequentially  in  99  vertical  swaths,  starting  at  the 
left  edge  and  ending  at  the  right  edge.  The  data  were  collected  into  100  bandwidths 
which  progress  linearly  in  wavenumber,  such  that  bandwidth  number  34  corresponds  to 
6328  A  (angstroms).  The  data  form  what  is  known  as  a  hypercube  with  the  x  and  y 
dimensions  corresponding  to  the  spatial,  and  z  being  frequency.  Figure  4.18  depicts  the 
data  in  its  hypercube  form.  Figure  4. 19  is  a  plot  of  band  37. 

The  objective  of  this  analysis  was  to  determine  the  network's  ability  to  distinguish 
between  the  camouflage  and  the  background.  To  show  that  significant  differences  do 
exist  in  the  spectral  response  of  the  camouflage  and  the  background  grass,  spectra  taken 
from  each  are  plotted  in  Figure  4.20.  The  line  with  the  highest  intensity  value  is  the 
background  grass,  the  middle  line  is  from  the  light  portion  of  the  camouflage  and  the 
dashed  line  is  from  the  dark. 

Both  of  the  previously  described  methods  of  determining  the  winning  neuron  were 
used  (ie.  the  sum  of  absolute  differences  and  the  dot  product).  The  process  and  results  of 
each  are  given  below. 

a.  Sum  of  Absolute  Differences 

An  ideal  classifying  scheme  would  separate  the  image  into  three  categories,  the 
background,  the  camouflage,  and  the  pole,  and,  most  importantly,  all  background  pixels 
would  be  classified  as  different  from  the  camouflage.  Because  the  camouflage  is  made 
of  two  different  colors,  it  is  unlikely  that  all  pixels  in  the  camouflage  would  all  be 
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classified  the  same.  A  more  realistic  hope  would  be  to  distinguish  the  camouflage  as  two 
different  groups,  each  being  separate  from  the  background. 

In  the  pursuit  of  this  goal,  the  image  was  read  into  networks  of  varying  sizes  (two 
to  eight  nodes).  Regardless  of  the  network  size,  the  algorithm  classified  the  image 
according  to  what  the  author  believes  is  intensity.  This  conclusion  is  based  on  the 
similarity  in  shape  of  each  set  of  the  final  weights.  To  illustrate  this  effect.  Figure  4.21  is 
presented.  It  is  a  plot  of  the  final  weights  associated  with  a  3  x  i  array.  The  network 
appears  to  be  replicating  the  average  spectra  of  pixels  of  like  intensity.  Figure  4.22 
shows  the  categories  produced  by  the  same  network.  Notice  that,  while  the  camouflage  is 
distinguishable  from  the  background,  many  pixels  in  both  the  camouflage  and  the 
background  are  classified  as  members  of  the  same  group. 

Several  attempts  were  made  to  preprocess  the  data  in  an  attempt  to  get  the 
network  to  classify  by  spectral  curve  shape.  These  are  listed  below: 

1 .  Normalization  of  the  maximum  intensity  value  in  each  pixel's  spectra  to  the 
overall  maximum  intensity  value  of  the  image. 

2.  Normalization  of  the  average  intensity  value  in  each  pixel's  spectra  to  the 
average  intensity  value  of  the  entire  image. 

3.  Retransformation  of  the  first  three  principle  components  back  into  frequency 
space.  This  acts  as  a  low  pass  filter. 

These  attempts  produced  results  that  were  no  better  than  the  initial  runs. 

b.  Correlation  Coefficient 

Using  the  correlation  coefficient  to  obtain  the  'distance'  to  the  winning  neuron 
worked  much  better.  Again,  the  image  was  read  into  networks  of  varying  sizes,  this  time 
with  dramatically  different  results.  Figure  4.23  is  a  plot  of  the  final  weights  associated 
with  a  3  X  1  array.  By  comparing  Figures  4.20,  4.21,  and  4.23  it  is  obvious  that  the 
network  is  no  longer  distinguishing  between  groups  according  to  intensity,  and  is  now 
looking  at  spectral  curve  shape.  Figure  4.24  is  the  classification  produced  by  this  same 
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network.  Notice  that  very  few  pixels  in  the  background  are  classified  as  being  in  the 
same  group  as  the  camouflage,  and  only  a  small  number  of  camouflage  pixels  are  miss- 
classified  as  background  pixels.  Also  note  that  the  camouflage  is  broken  into  two  distinct 
categories.  The  fact  that  the  pole  is  not  placed  into  a  separate  category  is  a  problem  that 
is  not  completely  understood.  One  possibilityis  that  the  pole  is  a  small  portion  of  the 
image  and  is  not  dramatically  different  spectrally  from  the  darker  portion  of  the 
camouflage. 

2.  Operation  Desert  Radiance  Image 

The  second  data  set  is  the  result  of  airborne  hyperspectral  remote  sensing  collections 
conducted  at  White  Sands,  New  Mexico  in  October  1994,  as  part  of  an  experiment  called 
Operation  DESERT  RADIANCE.  The  primary  objective  of  this  experiment  was  to 
determine  the  ability  of  remotely  sensed  data  to  detect  and  identify  a  target  under 
camouflage.  The  data  set  used  in  this  analysis  was  collected  by  an  airborne  AAHIS 
instrument,  flown  at  2000  feet.  In  the  scene  are  several  target  panels  and  an  M-60  tank 
covered  in  desert  camouflage.  The  background  mostly  consists  of  brown  sandy  dirt. 
Small  bushes  and  shrubs  are  present  throughout  the  scene. 

The  first  objective  was  to  see  if  the  network  could  identify  the  tank.  Many  attempts 
were  made  at  accomplishing  this  with  no  success.  In  a  last  ditch  effort,  a  4  x  2  network 
was  trained  on  an  area  of  pixels  of  which  most  (approximately  75%)  involved  the  tank. 
The  tank  was  still  divided  into  several  different  categories.  Because  the  data  were  taken 
in  the  visible  wavelengths,  and  camouflage  is  designed  to  inhibit  visible  detection,  this 
task  proved  to  be  too  difficult  for  the  network  to  accomplish.  What  the  network  did 
show  was  that  the  spectral  signature  of  the  target  was  not  significantly  different  from  its 
background.  In  other  words,  the  camouflage  did  its  job.  Based  on  the  results  obtained  in 
from  the  first  data  set,  the  author  believes  that,  had  the  data  included  the  near  IR 
wavelengths,  the  network  would  have  been  able  to  identify  the  target. 

The  network  did  isolate  several  interesting  categories.  These  are  presented  in  Figures 
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4.25,  4.26,  4.27,  and  4.28.  Figure  4.25  shows  an  instrument  calibration  panel,  which  was 
singled  out  as  a  separate  category.  The  author  believes  everything  in  Figure  4.26  to  be 
vegetation  of  some  kind,  mostly  small  bushes  and  shrubs.  Figures  4.27  and  4.28  all 
appear  to  be  members  of  the  target  panels,  set  out  as  part  of  the  experiment.  Lack  of 
adequate  ground  truth  prevents  further  analysis  of  these  groups. 

3.  Run  Times 

The  correlation  coefficient  method  of  measuring  similarity  resulted  in  a  much 
quicker  algorithm.  Test  runs  were  done  just  for  this  comparison.  (Typical  training  for 
images  took  much  longer,  on  the  order  of  one  to  three  hours.)  Consecutive  runs  were 
executed  on  the  first  data  set,  holding  all  parameters  constant.  Table  4.4  lists  several 
network  configurations  and  their  run  times. 


Size 

Time  (sec) 

Dot  Product  Summation 

2  X  1 

72 

206 

4x1 

105 

390 

4x2 

176 

755 

Table  4.4  Run  Time  Comparison 


Another  time  measurement  of  interest  is  the  time  it  takes  to  categorize  an  image. 
For  single  category  classification,  looking  for  one  item  of  interest  in  an  image,  the  first 
camouflage  image  averaged  19  seconds,  while  the  DESERT  RADIANCE  image 
averaged  3 1  seconds.  These  numbers  are  significant  when  trying  to  determine  this 
process's  ability  to  perform  realtime  data  processing.  The  DESERT  RADIANCE  image 
is  192  pixels  wide  by  512  pixels  long.  If  each  pixel  is  10  feet  across,  the  image  is  then 
5 1 20  feet  long.  A  processing  time  of  3 1  seconds  allows  the  collecting  device  to  travel  at 
almost  120  miles  per  hour.  This  appears  to  have  potential  for  real  time  classification 
(target  identification)  once  a  network  has  been  trained. 
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V.  CONCLUSIONS 


Hyperspectral  imagery  was  analyzed  using  the  Kohonen  Self-Organizing  Map 
neural  network.  Hyperspectral  data  was  successfully  read  into  the  network,  organized  and 
categorized. 

The  first  hyperspectral  image  to  be  analyzed  was  one  consisting  of  a  piece  of 
camouflage  cloth  surrounded  by  a  grassy  background.  The  data  contained  100  spectral 
bands  that  extended  from  approximately  450  ran  to  1040  ran.  The  network,  using  the 
correlation  coefficient  to  determine  the  nearest  neuron,  was  able  to  accurately  distinguish 
between  the  camouflage  and  the  background.  The  absolute  difference  algorithm 
converged  on  intensity. 

The  second  data  set  was  taken  as  part  of  operation  DESERT  RADIANCE.  It 
consisted  of  an  arid  landscape  separated  into  70  spectral  bands  ranging  from  440  ran  to 
870  ran.  Placed  in  the  image  were  several  target  panels,  an  M-60  tank  covered  by 
camouflage  and  a  calibration  panel.  The  data  were  collected  only  in  the  visible 
fi-equencies.  The  network  did  not  identify  the  tank.  However,  it  did  single  out  other 
objects  as  distinct  groups  in  the  scene.  Major  Mat  Fay  analyzed  this  same  scene  using 
coordinate  rotation  methods  with  some  success  at  isolating  the  tank  (Fay,  1995).  The 
contrast  between  the  two  approaches  is  instructive.  The  neural  net  technique  is  effective 
at  identifying  major  categories  in  a  hyperspectral  image.  Rotations  in  spectral  space  seem 
to  offer  more  promise  in  isolating  minority  elements. 

This  analysis  has  led  the  author  to  believe  that  the  best  approach  to  using  this 
neural  net  technique  is  to: 

1.  Take  a  representative  segment  of  whatever  environment  is  to  be  classified 
containing  the  item  or  items  of  interest. 

2.  Train  the  network  with  this  data  set. 

3.  Determine  if  the  item  of  interest  is  singled  out  as  a  separate  category. 
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If  so,  then  a  pretrained  network  operating  in  similar  environments  should  be  able  to 
identify  the  item  of  interest.  Real  time  processing  is  a  definite  possibility. 

In  order  to  demonstrate  the  applicability  of  this  network  to  conclusively  analysis  of 
hyperspectral  imagery,  several  data  sets  with  well  defined,  highly  accurate  ground  truths, 
are  needed. 
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APPENDIX  A 


Figure  2. 1  Energy  surface  representation  of 
neural  network. 


Figure  2.2  Typical  neural  node. 
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new  "old 


Figure  2.3  Learning  mode  block  diagram. 


Figure  2.4  Sigmoid  Function 


Fimre  2.5  Feed  forward  neural  network 
containing  hidden  layers. 


Figure  2.6  Example  problem  network. 
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Figure  2,7  Problem  definition. 
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Figure  2.8  Solution  to  example  problem. 
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Figure  3.1  Spectral  Reflectance  Plot  (Rinker,  1990) 


AAHIS  System  Block  Diagram 


Figure  3.2  AAHIS  Block  Diagram 
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A/l 


Figure  3.3  Michelson  Interferomet^  (Klein,  1970) 
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M2  (movable) 


Figure  3.4  Michelson  Interferometer  for  use  in  Spectroscopy  (Klein,  1970) 
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Figure  3.5  Schematic  of  SMIFTS  Instrument  (Lucey, 
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Results  From  Four  Node  Network  [4  X  1  ]  Run 


Figure  4.2  Plot  of  Final  Weights  for  Four  Node  Network,  Run  #  1 
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Figure  4.6  Plot  of  Final  Weights  for  Four  Node  Network,  2X2,  Run  #  5 
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me  4  7  Plot  of  Final  Weights  for  Four  Node  Network,  2X2,  Run  #  6 


Figure  4. 1 1  Test  Image  with  Noise 
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Figure  4.15  Five  Node  Network  Categorization 
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Figure  4. 18  Hypercube  of  Camouflage  Data 
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Figure  4. 19  Band  37  of  Camouflage  Data 
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Figure  4.20  Representative  Spectra  of  Camouflage  Data 


Figure  4.22  Sum  of  Absolute  Difference  Categorization  of  Camouflage  Data 
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Original  Data 


Category  B 


Figure  4.26  Network  Category  B  for  Desert  Radiance  Data,  Shrubs 
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Figure  4.28  Network  Category  D  for  Desert  Radiance  Data,  Target  Panels 


87 


APPENDIX  B 


;&&&&&&  to  be  used  with  bsq  data  cube  &&&&&&&&&&&&&&«&&&& 
;&&&&&&  for  hyperspectral  imagery  &&&&&&&&&&&&&&&&&&&& 

set_plot,  'x'  &  device,  pseudo  =  8,  retain  =  2 

;  initialize  network  size 

maxgdn  =  4  &  maxgde  =  4  &  maxit  =  5e3  ;  adjust  maxitr . 

xittr  =  0 


xsize  =  99  &  ysize  =  300  &  zsize  =  100 

nvar  =  zsize  &  ncase  =  xsize  *  ysize 

x2  =  fltarr(nvar+l,ncase+l)  &  xm  =  fltarr(nvar) 

wei  =  fltarr(ncase)  &  values  =  fltarr(nvar) 

list  =  intarr(xsize,ysize)  &  id  =  intarr(ncase) 

num  =  intarr(ncase) 

g  =  fltarr(nvar,  maxgde,  maxgdn) 

file2  =  'data.cube’  ;  data  file  is  arranged  as  bsq 
file?  =  'results.dat' 

print,  ’  input  file ',  file2 

print, '  output  file ',  file? 

print,  'Number  of  cases  to  be  read: ',  ncase 

print,  'Number  of  variables  to  be  read: ',  nvar 

print,  'Two  Dimensional  SOM  has  ',  maxgde,'  rows  and ',  maxgdn,'  colunms' 
print,  'Number  of  training  cases ',  maxit 

;  neighborhood  size  function 
lat  =  0 

if  (lat  eq  0)  then  print,  'simple  block  party  used' 

if  (lat  ne  0)  then  print, '  Gaussian  lateral  inhibition  function  used' 
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;**  set  some  constants  to  do  with  training  to  some  arbitrary  values 

alphaO  =  0.4  &  alpha  =  alphaO 

tmp  =  [maxgde,  maxgdn] 

dstepO  =  max(tmp)/2.5  &  dstep  =  dstepO 

dmaxtr  =  maxit 

;****  initialize  means 

xm(*)  =  0.0  &  idim  =  1  &  idum  =  0 

;***  read  data 


ire  =  1  &  weight  =  fltarr(ncase) 

camo  =  intarr(xsize,ysize,zsize) 
openr,  3,  file2 
forrd,  3,  camo 
close,  3 

t  =  total(camo,3)/100 
x2  =  reform(t,xsize*ysize) 

X  =  camo 

for  i  =  0, 99  do  begin 
tvscl,x(*,*,i) 
endfor 

weight  =  .6 

print, '  all  data  read  ok ' 


,^:ic**5j:H;****  ********  *************  *********** 
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;  calculate  variable  means 
xm  =  total(x2)/ncase 


seed  =  0.0 

;  initialize  map  weights 

tmp  =  0.01*xm*randomu(  seed,  nvar,  maxgde,  maxgdn) 
for  k  =  0,  nvar-1  do  begin 

g(k,0;maxgde-l,0:maxgdn-l)  =  xm  +  tmp(k,0:maxgde-l,0:maxgdn-l)*xm/3.0 
endfor 

plot,g  &  wait,  1  &  erase 

;  other  inks  and  form  cumulative  sampling  sums 
icl(*)  =  0  &  wei(*)  =  0 


movers  =  0  &  kik  =  0  &  nup  =  0 

sumsam  =  wei(ncase-l) 

dcon  =  2.0/(dstep*dstep)  &  print,  dstep,dcon 


aver  =  0.0  &  n=0 


^1#  ^  ^  «i|i^  ^  ^  ^  ^  tf|c  #|c  ^fc  ^{c  sjc  dfc  3|c  v|c  «|c 

9 

•  (1*3111111^^  ^1*00633  3(3I*(3  licro 

for  iterl  =  OL,  maxit-1  do  begin 
;  select  training  case  at  random  but  proportional  to  weighting: 
n=  n  + 1 

kk  =  intarr(xsize)  &  kk2  =  intarr(ysize) 

k=fix((xsize- 1  )*randomu(s))  ;  index  into  x-dim  of  data 

k2=fix((ysize-  l)*randomu(s))  ;  index  into  y-dim  of  data 
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values  =  x(k,k2,0:nvar-l) 


;  randomly  chosen  spectrum 


;  locate  the  neuron  closest  to  the  randomly  chosen  input  spectrum 
;  use  correlation  coefficient  to  determine  winner 

best  =  .001  ;  compare  to  each  neuron 

for  j  =  0,  maxgde-1  do  begin 
for  k  =  0,  maxgdn-1  do  begin 

sum  =  0L  &  mag_g  =  0L  &  mag_v  =  0L 
a  =  g(*,J,K)  &  a  =  reform(a)  ;  make  sure  vector  is  1-D 

b  =  values(*)  «&  b  =  reform(float(b)) 
c  =  a*b 

sum  =  total(c)  &  mag_g  =  sqrt(total(a*a))  &  mag_v  =  sqrt(total(b*b)) 
corr  =  sum/(mag_g*mag_v) 

if  (corr  gt  best)  then  begin 
best  =  corr  &  indj  =  j  &  indk  =  k 
endif 

endfor 

endfor 

;********  now  have  location  of  neuron  which  most  closely  equals  the  data  ***** 
aver  =  aver  +  best 
;  set  classification 

new  =  (indj  -1)*  maxgde  +  indk 
if  (new  ne  icl(k*k2) )  then  begin 
movers  =  movers  +1 
id  (kk)  =  new 
endif 
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;  set  distance  threshold  for  block  neighborhood 
dsq  =  dstep  *  dstep  &  dcon  =  2.0/dsq 

;  now  update  best  matched  neuron  AND  its  neighbours 
for  j  =  0,  maxgdn-1  do  begin 
for  k  =  0,  maxgde-1  do  begin 
;  calculate  distance  from  winning  neuron 
dis  =  (j-indj)'^2  +  (k-indk)''2 

;  compute  weights  due  to  lateral  inhibition 
if  (lat  eq  0)  then  begin 
if  (dis  gt  dsq)  then  goto,  c2(X) 
syn  =  alpha 
endif  else  begin 
gaus  =  dis/dcon 
if  (gaus  gt  20.0)  then  begin 
gaus  =  0.0 
endif  else  begin 
gaus  =  exp(-gaus) 
endelse 
endelse 

;  update  weights 
if  ( syn  ne  0.0)  then  begin 
nup  =  nup  +  1 
for  L  =  0,  nvar  -1  do  begin 
G(L,K,J)  =  G(L,K,J)  +  syn  *  (values(L)  -  G(L,K,J) ) 
endfor 
endif 

c200: 

endfor 

endfor 
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;  print  something  every  now  and  again 
kik  =  kik+1 

if(kik  eq  le2)  then  begin 
aver  =  aver/le4 

plot,  g,  title  =  'Iteration  '  +  string(iterl),  ytitle  =  'weights' 
print,  'iteration  Number: ',  iterl 
print,  'Moves: ',  movers, '  Dstep: ',  dstep 
print,  'alpha: ',  alpha,  '  Updates: ',  nup 
print,  'correlation ',  best 
movers  =  0  &  kik  =  0  &  aver  =  0.0  &  nup  =  0 
endif 

;  change  training  parameters 

;  there  will  be  a  dramatic  change  in  the  training  process  when 
;  the  neighborhood  shrinks  to  the  point  where  single  neurons  are 
;  being  trained 

alpha  =  alphaO*  (1.0  -  iterl/dmaxtr) 
dstep  =  dstepO*  (1.0  -  iterl/dmaxtr) 

endfor  ;  iterl  loop  at  top 

.  ♦***  r>f  training*************************************************** 
print,  'training  is  completed ' 
set_plot,  'PS' 

device,  file='weights.ps',  /portrait,  bits_per_pixel=8,  /close_file,  /inches,  $ 
yoffset  =  2.5,  ysize  =  6.0,  xoffset  =  1.25,  xsize  =  6.0 
plot,  g,  xtitle  =  'Weights',  ytitle  =  'Intensity',  title  =  'Network  Weights' 
device,  /close_file 
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.*4:**************  classify  data 


;  sequence  through  each  pixel 

for  i  =  0,  xsize-1  do  begin 
for  w  =  0,  ysize-1  do  begin 

best  =  .001 

;  sequence  through  each  neuron 
for  j  =  0,  maxgde  -1  do  begin 
for  k  =  0,  maxgdn  -1  do  begin 

sum  =  OL  &  mag_g  =  OL  &  mag_v  =  OL 
a  =  g(*,J,K)  &  a  =  reform(a)  &  b  =  x(i,w,*)  &  b  =  reform(float(b)) 
c  =  a*b 
sum  =  total(c) 

mag_g  =  sqrt(total(a*a))  &  mag_v  =  sqrt(totaI(b*b)) 
corr  =  sum/(mag_g*mag_v) 

if  ( corr  gt  best)  then  begin 

best  =  corr  &  indj  =  j  &  indk  =  k 

endif 

endfor 

endfor 

;  list  is  an  array  that  has  the  same  size  as  the  original  image 
;  the  values  stored  in  list  are  the  categories  for  each  pixel  element 
list(i,w)  =  (indj)  *  maxgde  +  indk+1 

endfor  ;  end  of  the  i  loop 
print,’categorizing  column: i, '  of xsize 
endfor  ;  end  of  the  w  loop 

.4:*************  done  with  categorization************** 
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print,  'done  with  categorization' 


list=50*list  ;  a  little  redundant,  with  tvscl  next.... 
tmp  =  size(list) 
set_plot,  'x' 

window,  1,  xsize  =  tmp(l)  +  40 ,  ysize  =  tnip(2)  +60,  title  =  'results' 
tvscl. list,  20,  30 
load  ,  13 

set_plot,  'PS' 

device,  file='results.ps',  /portrait,  /close_file,  /inches,  $ 
yoffset  =  2.5,  ysize  =  6.0,  xoffset  =  3.25,  xsize  =  6.0,  $ 
bits_per_pixel=8 
tvscl,list 

device,  /close_file 

openw,  7  ,  file? 
printf,  7 ,  list 
close,? 

set_plot,  'x' 

end 
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